Additional GPS Generalized Path Seeker Features
Additional MARS Features
Additional GPS Generalized Path Seeker Features
Additional Random Forests Features are available in Pro, ProEx, and Ultra.
Additional TreeNet Features
Advances in Gradient Boosting: The Power of Post Processing
Advances in Gradient Boosting: the Power of Post-ProcessingLearn how TreeNet stochastic gradient boosting can be improved by post processing techniques such as GPS Generalized Path Seeker, RuleLearner, and ISLE.
I. Gradient Boosting and Post-Processing:
- What is missing from Gradient Boosting?
- Why post-processing techniques are used?
II. Applications Benefiting from Post-Processing: Examples from a variety of industries.
- Financial Services
III. Typical Post-Processing Steps
- Generalized Path Seeker (GPS): Modern high-speed LASSO-style regularized regression
- Importance Sampled Learning Ensembles (ISLE): identify and reweight the most influential trees
- RuleLearner: ISLE on “steroids.” Identify the most influential nodes and rules
V. Case Study Example
- Output/Results without Post-Processing
- Output/Results with Post-Processing
Watch the Video
Components and Features
|SPM Components and Features||What's New|
|SPM Components and Features||What's New|
|CART (Classification and Regression Trees)||User defined linear combination lists for splitting; Constrains on trees; Automatic addition of missing value indicators; Enhanced GUI reporting; User controlled Cross Validation; Out-of-bag performance stats and predictions; Profiling terminals nodes based on user supplied variables; Comparison of Train vs. Test consistency across nodes; RandomForests-style variable importance|
|MARS (Automated Nonlinear Regression)||Updated GUI interface; Model performance based on independent test sample or Cross Validation; Support for time series models|
|TreeNet (Gradient Boosting, Boosted Trees)||One-Tree TreeNet (CART alternative); RandomForests via TreeNet (RandomForests regression alternative) Interaction Control Language (ICL); Interaction strength reporting; Enhanced partial dependency plots; RandomForests-style randomized splits;|
|RandomForests (Bagging Trees)||RandomForests regression; Saving out-of-bag scores; Speed enhancements|
|High-Dimensional Multivariate Pattern Discovery||Battery Target (link) to identify mutual dependencies in the data|
|Unsupervised Learning (Breiman's Column Scrambler)||New|
|Model Compression and Rule Extraction||New: ISLE; RuleLearner; Hybrid Compression|
|Automation||56 pre-packaged scenarios based on years of high-end consulting|
|Parallel Processing||New: Automatic support of multiple cores via multithreading|
|Hotspot Detection||Segment Extraction (Battery Priors)|
|Missing Value Handling and Imputation|
|Outlier Detection||New: GUI reports, tables, and graphs|
|Linear Methods for Regression, Recent Advances and Discoveries||New: OLS Regression; Regularized Regression Including: LAR/LASSO Regression; Ridge Regression; Elastic Net Regression/ Generalized Path Seeker|
|Linear Methods for Classification, Recent Advances and Discoveries||New: LOGIT; LAR/LASSO; Ridge; Elastic Net/ Generalized Path Seeker|
|Model Assessment and Selection||Unified reporting of various performance measures across different models|
|Ensemble Learning||New: Battery Bootstrap; Battery Model|
|Time Series Modeling||New|
|Model Simplification Methods|
|Data Preparation||New: Battery Bin for automatic binning of a user selected set of variables with large number of options|
|Large Data Handling||64 bit support; Large memory capacity limited only by your hardware|
|Model Translation (SAS, C, Java, PMML, Classic)||Java|
|Data Access (all popular statistical formats supported)||Updated Stat Transfer Drivers including R workspaces|
|Model Scoring||Score Ensemble (combines multiple models into a powerful predictive machine)|
CART® - Classification and Regression Trees
Ultimate Classification Tree:
Salford Predictive Modeler’s CART® modeling engine is the ultimate classification tree that has revolutionized the field of advanced analytics, and inaugurated the current era of data science. CART is one of the most important tools in modern data mining.
Technically, the CART modeling engine is based on landmark mathematical theory introduced in 1984 by four world-renowned statisticians at Stanford University and the University of California at Berkeley. The CART Modeling Engine, SPM’s implementation of Classification and Regression Trees, is the only decision tree software embodying the original proprietary code.
Fast and Versatile:
Patented extensions to the CART modeling engine are specifically designed to enhance results for market research and web analytics. The CART modeling engine supports high-speed deployment, allowing Salford Predictive Modeler’s models to predict and score in real time on a massive scale. Over the years the CART modeling engine has become known as one of the most popular and easy-to-use predictive modeling algorithms available to the analyst, it is also used as a foundation to many modern data mining approaches based on bagging and boosting.
General Features Introduction
SPM® 8 Introduction
Automatic Non-Linear Regression
The MARS® modeling engine is ideal for users who prefer results in a form similar to traditional regression while capturing essential nonlinearities and interactions. The MARS methodology’s approach to regression modeling effectively uncovers important data patterns and relationships that are difficult, if not impossible, for other regression methods to reveal. The MARS modeling engine builds its model by piecing together a series of straight lines with each allowed its own slope. This permits the MARS modeling engine to trace out any pattern detected in the data.
High-Quality Regression and Classification
The MARS Model is designed to predict numeric outcomes such as the average monthly bill of a mobile phone customer or the amount that a shopper is expected to spend in a web site visit. The MARS engine is also capable of producing high quality classification models for a yes/no outcome. The MARS engine performs variable selection, variable transformation, interaction detection, and self-testing, all automatically and at high speed.
Areas where the MARS engine has exhibited very high-performance results include forecasting electricity demand for power generating companies, relating customer satisfaction scores to the engineering specifications of products, and presence/absence modeling in geographical information systems (GIS).
Breiman and Cutler’s Random Forests®:
Random Forests modeling engine is a collection of many CART® trees that are not influenced by each other when constructed. The sum of the predictions made from decision trees determines the overall prediction of the forest. Random Forests' strengths are spotting outliers and anomalies in data, displaying proximity clusters, predicting future outcomes, identifying important predictors, discovering data patterns, replacing missing values with imputations, and providing insightful graphics.
Cluster and Segment:
Much of the insight provided by the Random Forests modeling engine is generated by methods applied after the trees are grown and include new technology for identifying clusters or segments in data as well as new methods for ranking the importance of variables. The method was developed by Leo Breiman and Adele Cutler of the University of California, Berkeley, and is licensed exclusively to Minitab.
Suited for Wide Datasets:
Random Forests is a collection of many CART trees that are not influenced by each other when constructed. The sum of the predictions made from decision trees determines the overall prediction of the forest. Random Forests is best suited for the analysis of complex data structures embedded in small to moderate data sets containing less than 10,000 rows but potentially millions of columns.
A user's license sets a limit on the amount of learn sample data that can be analyzed. The learn sample is the data used to build the model. Note that there is no limit to the number of test sample data points that may be analyzed. In other words, rows -by- columns of variables and observations used to build the model. Variable not used in the model do not count. Observations reserved for testing, or excluded for other reasons, do not count.
For example, suppose our 32MB version that sets a learn sample limitation of 8 MB. Each data point occupies 4 bytes. For instance, a 8MB capacity license will allow up to 8 * 1024 * 1024 / 4 = 2,097,152 learn sample data points to be analyzed.A data point is a represented by a 1-variable by- 1-observation (1-row by-1-column).
The following is a table that describes the current set of "sizes" available. Please note that the minimum required RAM is **not** the same as the learn sample limitation.
|Size||Data Limit MB||Data Limit # of values|
(RAM) in MB
Licensed learn sample
data sizein MB
(1 MB = 1,048,576 bytes)
Licensed # of learn
(rows by columns)
Additional larger capacity is available under 64-bit operating systems, using our non-GUI (command-line) builds. The non-GUI is very flexible and can be licensed for large data limits not currently available in the GUI product line. The current MAXIMUM is 8-GIG data capacity for our non-GUI builds.
SPM® University Program
SPM 8 versions
SPM® v8.2 User Guide