Salford Predictive Modeler® System Requirements
LOGIT is a comprehensive package for logistic regression analysis, providing tools for model building, model evaluation, prediction, simulation, hypothesis testing and regression diagnostics. A fast, full-featured software package, LOGIT is capable of handling an unlimited number of cases and includes special tools for discrete choice models.
View the The Hybrid CART®-Logit Model in Classification and Data Mining presentation by Dr. Dan Steinberg explaining the benefits of using Hybrid CART® with Logit.
Automatic Non-Linear Regression
The MARS® modeling engine is ideal for users who prefer results in a form similar to traditional regression while capturing essential nonlinearities and interactions. The MARS methodology’s approach to regression modeling effectively uncovers important data patterns and relationships that are difficult, if not impossible, for other regression methods to reveal. The MARS modeling engine builds its model by piecing together a series of straight lines with each allowed its own slope. This permits the MARS modeling engine to trace out any pattern detected in the data.
High-Quality Regression and Classification
The MARS Model is designed to predict numeric outcomes such as the average monthly bill of a mobile phone customer or the amount that a shopper is expected to spend in a web site visit. The MARS engine is also capable of producing high quality classification models for a yes/no outcome. The MARS engine performs variable selection, variable transformation, interaction detection, and self-testing, all automatically and at high speed.
Areas where the MARS engine has exhibited very high-performance results include forecasting electricity demand for power generating companies, relating customer satisfaction scores to the engineering specifications of products, and presence/absence modeling in geographical information systems (GIS).
Breiman and Cutler’s Random Forests®:
Random Forests modeling engine is a collection of many CART® trees that are not influenced by each other when constructed. The sum of the predictions made from decision trees determines the overall prediction of the forest. Random Forests' strengths are spotting outliers and anomalies in data, displaying proximity clusters, predicting future outcomes, identifying important predictors, discovering data patterns, replacing missing values with imputations, and providing insightful graphics.
Cluster and Segment:
Much of the insight provided by the Random Forests modeling engine is generated by methods applied after the trees are grown and include new technology for identifying clusters or segments in data as well as new methods for ranking the importance of variables. The method was developed by Leo Breiman and Adele Cutler of the University of California, Berkeley, and is licensed exclusively to Minitab.
Suited for Wide Datasets:
Random Forests is a collection of many CART trees that are not influenced by each other when constructed. The sum of the predictions made from decision trees determines the overall prediction of the forest. Random Forests is best suited for the analysis of complex data structures embedded in small to moderate data sets containing less than 10,000 rows but potentially millions of columns.
TreeNet® Gradient Boosting is Salford Predictive Modeler’s most flexible and powerful data mining tool, capable of consistently generating extremely accurate models. The TreeNet modeling engine’s level of accuracy is usually not attainable by single models or by ensembles such as bagging or conventional boosting. The TreeNet engine demonstrates remarkable performance for both regression and classification. The algorithm typically generates thousands of small decision trees built in a sequential error–correcting process to converge to an accurate model. The TreeNet modeling engine has been responsible for the majority of Minitab’s modeling competition awards.
The TreeNet® modeling engine adds the advantage of a degree of accuracy usually not attainable by a single model or by ensembles such as bagging or conventional boosting. As opposed to neural networks, the TreeNet methodology is not sensitive to data errors and needs no time-consuming data preparation, pre-processing or imputation of missing values. This type of data error can be very challenging for conventional data mining methods and will be catastrophic for conventional boosting. In contrast, the TreeNet model is generally immune to such errors as it dynamically rejects training data points too much at variance with the existing model. The TreeNet modeling engine robustness extends to data contaminated with erroneous target labels.
Interaction detection establishes whether interactions of any kind are needed in a predictive model, and is a search engine discovering specifically which interactions are required. The interaction detection system not only helps improve model performance (sometimes dramatically) but also assists in the discovery of valuable new segments and previously unrecognized patterns.
Technical Articles by Jerome Friedman are also available for download:
- Greedy Function Approximation: A Gradient Boosting Machine introduces the methodology.
- Stochastic Gradient Boosting discusses several improvements to the original idea.