Navigating The New Features in SPM v7.0
The SPM software suite v7.0 is Salford Systems' latest release of its award-winning suite of sophisticated data mining software. So, what's new in SPM v7.0? And, what's all this talk about "batteries" A.K.A automation?
The SPM software suite was first released in 2010 and is an ultra-fast data mining software suite. Its core components include CART (Classification and Regression Trees) MARS (Multivariate Adaptive Regression Splines), TreeNet stochastic gradient boosting, and Random Forests. In the recent release of SPM v7.0 a NEW algorithm has been introduced called Generalized PathSeeker (GPS), along with many new and/or upgraded features.
Automation "Battery" Features
New in SPM v7.0 are 56 pre-packaged scenarios, basically experiments, inspired by how leading model analysts structure their work. We call them "Batteries." These "Batteries" or experiments, create multiple models automatically so that the analyst can easily see choices. Batteries are used in many different ways, in many different scenarios.
Example1: Banking Application
Battery Shaving helps to identify subsets of informative data within large datasets containing correlated variables within the account data. With automation, you may accomplish significant model reduction with minimal (if any) sacrifice to model accuracy. For example, start with a complete list of variables, and run automated shaving from the top to eliminate variables that look promising on the learn sample but fail to generalize. Later you can run shaving from the bottom to automatically eliminate a major bulk of redundant and unnecessary predictors. Then follow up with "shaving error" to quickly zero in on the most informative subset of features.
As opposed to typical data mining tools, Battery Shaving offers more than the typical variable importance list. Additionally, the analyst is provided with a full set of variable importance subsets/variations enabling the analyst to quickly optimize/select the final variable list and eliminating the burden of repetitive testing. Expert modelers typically devote a lot of time and effort to optimizing their variable importance list; Battery Shaving automates this process.
Example 2: Fraud Detection
In typical fraud detection applications the analyst is concerned with identifying different sets of rules leading to a varying probability of fraud. Decision trees and TreeNet gradient boosting technology are typically used to build classification rules for detecting fraud. Any classification tree is constructed based on a specific user-supplied set of prior probabilities.
One set of priors will force trees to search for rules with high levels of fraud, while other sets of priors will produce trees with somewhat relaxed assumptions. To gain the most benefits of tree-based rule searching approaches, analysts will try a large number of different configurations of prior probabilities. This process is fully automated in Battery Priors. The result is a large collection of rules ranging from extremely high confidence fraud segments with low support to moderate indication of fraud segments with very wide support. For example, you can identify small segments with 100% fraud or you may find a large segment with a lesser probability of fraud, and everything in-between.
Example 3: Market Research: Surveys
BATTERY MVI (MISSING VALUE INDICATORS)
In any survey, a large fraction of information may be missing. Often, the respondent will not answer questions either because they don't want to or are unable to do so. In addition to Salford Systems' expertise in handling missing values, a new automation feature allows the analyst to automatically generate multiple models including: 1) a model predicting response based solely on the pattern of missing values; 2) a model that automatically creates dummy missing value indicators in addition to the original set of predictors; and/or 3) a model that relies on engine-specific internal handling of missing values.
New Features and Components
- Unsupervised Learning
Breiman’s Column Scrambler
- Text Mining
- Model Compression and Rule Extraction:
Unified reporting of various performance measures
- Parallel Processing:
Automatic support of multiple cores via multithreading
- Outlier Detection:
GUI reports, tables, and graphs
- Linear Methods for Regression, Recent Advances and Discoveries:
OLS Regression; Regularized Regression Including: LAR/LASSO Regression; Ridge Regression; Elastic Net Regression
- Linear Methods for Classification, Recent Advances and Discoveries:
LOGIT; LAR/LASSO; Ridge; Elastic Net/ Generalized Path Seeker
- Ensemble Learning:
Battery Bootstrap; Battery Model
- Time Series Modeling
- Data Preparation:
Battery Bin for automatic binning of a user selected set of variables with large number of options
- Model Simplification Methods
- Large Data Handling:
64 bit support; Large memory capacity limited only by your hardware
- The Evolution of Regression: From Classical Linear Regression to Modern Ensembles (4-part series)
- Advances in Stochastic Gradient Boosting: The Power of Post-Processing
- Combining CART decision trees with TreeNet stochastic gradient boosting: A winning combination
If you're currently evaluating or have a licensed version of SPM v7.0 let us know your takeaways and comparisons!