Salford Systems logo white space
Navigation
white space
white space
white space
Support > Frequently Asked Questions > RandomForest
RandomForest Frequently-Asked Questions


Q1. What is RandomForests?

A1. RandomForests represents a newly-developed data analysis tool for data mining and predictive modeling. It generates and combines decision trees into predictive models and displays data patterns with a high degree of accuracy. The method was developed by Leo Breiman and Adele Cutler of the University of California, Berkeley, and is licensed exclusively to Salford Systems.



Return to top



Q2. How does RandomForests work?

A2. RandomForests is a collection of many CART® trees that are not influenced by each other when constructed. The sum of the predictions made from decision trees determines the overall prediction of the forest.
Two forms of randomization occur in RandomForests; one is by trees and one by node. At the tree level, randomization takes place via observations. At the node level, this occurs by using a randomly-selected subset of predictors. Each tree is grown to a maximal size and left unpruned. This process is repeated until a user-defined number of trees is created; the collection is a random forest. Once this is created, the predictions for each tree are used in a "voting" process. The overall prediction is determined by voting for classification and by averaging for regression.



Return to top



Q3. What are the advantages of RandomForests?

  • Automatic predictor selection from any number of candidates
    • the analyst does not need to do any variable selection or data reduction
    • will automatically identify the best predictors
  • Ability to handle data without preprocessing
    • data do not need to be rescaled, transformed, or modified
    • resistant to outliers
    • automatic handling of missing values
  • Resistance to over training
    • generates numerous trees based on two forms of randomization
    • growing a large number of RandomForests trees does not create a risk of overfitting
    • each tree is an independent, random experiment
  • Self-testing using "out-of-bag" data
    • self-testing is based on an extension of cross validation
    • self-tests provide highly reliable assessments of the model
  • Cluster identification
    • can be used to generate tree-based clusters
    • predictor variables defining clusters are chosen automatically
  • Visualization
    • RandomForests offers graphics, which yield new insights into data




Return to top



Q4. What are RandomForests' strengths?

A4. RandomForests specializes in classification and regression problems. Its strengths are spotting outliers and anomalies in data, displaying proximity clusters, predicting future outcomes, identifying important predictors, discovering data patterns, replacing missing values with imputations, and providing insightful graphics. Additionally, it can provide clustering and density estimations.



Return to top



Q5. Is RandomForests a black box?

A5. RandomForests is not a black box. It produces descriptive reports and displays that allow the user to gain insight into the data.



Return to top



Q6. How long will it take to learn RandomForests?

A6. Learning to get the best results only takes a short period of time. A few control parameters influence the quality of RandomForests model. Therefore, it is quite easy to discover the best settings.



Return to top



white space
© Copyright 2003-2004 Salford Systems - Print this page white space