SPM® v8.2 User Guide
The Salford Predictive Modeler® Software Suite
Core components include CART®, MARS®, TreeNet®, Random Forests®, and Generalized PathSeeker
CART® Classification and Regression Trees
This guide provides a brief introduction to CART.
Title: Introducing CART
Short Description: This guide describes the CART product and illustrates some practical examples of its basic usage and approach.
Long Description: Welcome to CART, a robust decision-tree tool for data mining, predictive modeling,and data preprocessing. CART (Classification and Regresion Trees) automatically searches for important patterns and relationships, uncovering hidden structure even in highly complex data. CART trees can be used to generate accurate and reliable predictive models for a broad range of applications from bioinformatics to risk management and new applications are being reported daily.
Salford Systems' CART is the only decision-tree system based on the original CART code developed by world-renowned Stanford University and University of California at Berkeley statisticians Breiman, Friedman, Olshen and Stone.
Key Words: CART, Classification, and Regression Trees, decision trees, predictive models, data mining
Welcome to CART, a robust decision-tree tool for data mining, predictive modeling, and data preprocessing. CART automatically searches for important patterns and relationships, uncovering hidden structure even in highly complex data. CART trees can be used to generate accurate and reliable predictive models for a broad range of applications from bioinformatics to risk management and new applications are being reported daily. The most common applications include churn prediction, credit scoring, drug discovery, fraud detection, manufacturing quality control, and wildlife research. Several hundred detailed applications studies are available from our website at http://www.salford-systems.com.
CART uses an intuitive, Windows-based interface, making it accessible to both technical and nontechnical users. Underlying the "easy" interface, however, is a mature theoretical foundation that distinguishes CART from other methodologies and other decision trees. Salford Systems' CART is the only decision-tree system based on the original CART code developed by world-renowned Stanford University and University of California at Berkeley statisticians Breiman, Friedman, Olshen and Stone. The core CART code has always remained proprietary and less than 20% of its functionality was described in the original CART monograph. Only Salford Systems has access to this code, which now includes enhancements co-developed by Salford Systems and CART's originators.
MARS® Multivariate Adaptive Regression Splines
This guide provides a brief introduction to MARS.
MARS, is considered the world’s first truly successful automated regression modeling tool. Multivariate Adaptive Regression Splines (MARS) has become widely known in the data mining and business intelligence worlds only recently through our seminars and the enthusiastic endorsement of leading data mining specialists. MARS is an innovative and flexible modeling tool that automates the building of accurate predictive models for continuous and binary dependent variables. It excels at finding optimal variable transformations and potential interaction within any regression-based modeling solution and easily handles the complex data structure that often hides in high-dimensional data. In doing so, this new approach to regression modeling effectively uncovers important data patterns and relationships that are difficult, if not impossible, for other methods to reveal.
TreeNet® Stochastic Gradient Boosting
This guide describes the TreeNet product and illustrates some practical examples of its basic usage and approach.
TreeNet is a revolutionary advance in data mining technology developed by Jerome Friedman, one of the world's outstanding data mining researchers. TreeNet offers exceptional accuracy, blazing speed, and a high degree of fault tolerance for dirty and incomplete data. It can handle both classification and regression problems and has been proven to be remarkably effective in traditional numeric data mining and text mining.
This guide provides an introduction into RandomForests Modeling Basics.
This guide describes what’s under the hood, beginning with why RandomForests’ engine is both unique and innovative. Because RandomForests is such a new tool, we assume no prior knowledge of the adaptive modeling methodology underlying RandomForests. To put this methodology into context, the first section discusses the modeler’s challenge and addresses how RandomForests meets this challenge. The remaining sections provide detailed explanations of how the RandomForests model is generated, how RandomForests handles categorical variables and missing values, how the “optimal” model is selected and, finally, how testing regimens are used to protect against overfitting.
GPS Generalized Path Seeker
This guide provides a brief introduction to GPS as well as a guide insight for model interpretation.
GPS or Generalized PathSeeker is a highly specialized and flexible regression (and logistic regression) procedure developed by Jerome Friedman (the co-creator of CART and the developer and inventor of MARS and TreeNet, among several other major contributions to data mining and machine learning). GPS is a "regularized regression" procedure meaning that it is designed to handle modeling challenges that are difficult or impossible for everyday regression.