The SPM Salford Predictive Modeler software suite offers several tools for clustering and segmentation including CART®, Random®Forests®, and a classical statistical module CLUSTER. In this article we illustrate the use of these tools with the well known Boston Housing data set (pertaining to 1970s housing prices and neighborhood characteristics in the greater Boston area).
Continue Reading
Random Forests® is a collection of many CART® trees that are not influenced by each other when constructed. The sum of the predictions made from decision trees determines the overall prediction of the forest. Two forms of randomization occur in Random Forests, one by trees and one by node. At the tree level, randomization takes place via observations. At the node level, randomization occurs by using a randomly-selected subset of predictors. Each tree is grown to a maximal size and left unpruned. This process is repeated until a user-defined number of trees is created, a collection called a random forest. Once this is created, the predictions for each tree are used in a "voting" process. The overall prediction is determined by voting for classification and by averaging for regression.
Continue Reading
Learning to get the best results takes a short period of time. A few control parameters influence the quality of Random Forests models. Therefore, it is quite easy to discover the best settings.
Continue Reading
Random Forests® specializes in classification and regression problems. Its strengths are spotting outliers and anomalies in data, displaying proximity clusters, predicting future outcomes, identifying important predictors, discovering data patterns, replacing missing values with imputations, and providing insightful graphics. Additionally, it can provide clustering and density estimations.
Continue Reading