What are the advantages of RandomForests?
- Automatic predictor selection from any number of candidates
1. The analyst does not need to do any variable selection or data reduction.
2. The best predictors are automatically identified.
- Ability to handle data without preprocessing
1. Data do not need to be rescaled, transformed, or modified.
2. resistant to outliers
3. automatically handles missing values
- Resistance to over training
1. Numerous trees are generated based on two forms of randomization.
2. Growing a large number of RandomForests trees does not create a risk of overfitting.
3. Each tree is an independent, random experiment.
- Self-testing using "out-of-bag" data
1. Self-testing is based on an extension of cross-validation.
2. Self-tests provide highly reliable assessments of the model.
- Cluster identification
1. can be used to generate tree-based clusters
2. Predictor variables defining clusters are chosen automatically.
1. RandomForests offers graphics that yield new insights into data.