|
Salford Predictive Modeler™ Two-Day Training |
AGENDATwo-Day Salford Predictive Modeler Training Hosted by Salford Systems
It is optional to bring your own laptop, software and data sets.
*If you would like evaluation software during the training you must request it from
This e-mail address is being protected from spambots. You need JavaScript enabled to view it.
and have it installed prior to the first day of training.
Day 1
| 9am – 10am |
Introduction to Predictive Modeling with Decision Trees Using CART Discover the power of tree-structured data mining during this popular introductory seminar, geared toward statisticians and IT audiences who are interested in understanding the conceptual basis of decision tree technology -- what it is, why it works, how it has been used, and how it can help you make better business decisions. - Decision tree fundamentals
- Decision tree applications
- How to build and interpret CART models
|
| 10am – 10:15am | Break |
| 10:15am – 11: 15am | An Introduction to Salford Predictive Modeler
Explore SPM's unique modeling automation capabilities while running multiple data sets on both GUI and Non-GUI interfaces, and the advantages and disadvantages to both. We'll introduce the CART component of SPM, and explain:
- Parts of the display
- Variable importance
- Summary reporting
- Surrogates and competitors
- How the utility handles missing values
|
| 11:15am – 11:30am | Break |
| 11:30am – 12:30pm | Introduction to the Powerful Use of Batteries in SPM
Major Battery Functions:
- Battery target and handling missing values
- Importance of the prior probabilities control in CART
- Battery priors
- Uses for hotspot detection
|
| 12:30pm – 1:30pm | Lunch |
| 1: 30pm – 2:30pm | Introduction to Multivariate Adaptive Regression Splines (MARS)
Understand tree-based regression using MARS, its advantages and disadvantages, piece-wise constant solutions and how it bridges the evolution of the regression component in CART.
Introduction to the core concepts of MARS:
- Adaptive Modeling
- Smooths, splines and knots
- Basis function
|
| 2:30pm – 2:45pm | Break |
| 2:45pm – 3:45pm | MARS in Action
Develop more accurate regression models for problems such as predicting credit card holder balances, insurance claim losses, and customer catalog orders.
Guide to reading the MARS output:
- Build a MARS model in SPM
- Understand the MARS interface
- Control Parameters
- How MARS handles categorical predictors
- How MARS handles binary responses
|
| 3:45pm – 4pm | Break |
| 4pm – 5pm | Introduction to Ensemble-Based Modeling Techniques
RandomForests®, created by Leo Breiman and Adele Cutler, is based on learning ensembles of CART trees. By judiciously injecting randomness into the tree-building process and then combining hundreds of these trees, RF is able to deliver high performance predictive models and a variety of novel exploratory data analysis results. RF also incorporates new metric free CLUSTER analyses that automatically select the variables used to define each cluster, with potentially different variables defining each cluster. |
Day 2
| 9am – 10am | Introduction to Boosting Using Decision Trees
| |
TreeNet stochastic gradient boosting is Stanford University Professor Jerome Friedman's latest advance in data mining methodology. In TreeNet, classification and regression models are built up gradually through a potentially large collection of small trees, each of which improves on its predecessors through an error-correcting strategy. Although each tree may have only one split, the full model can be extraordinarily accurate. The final model takes the form of a series expansion in which every term is a (small) tree.
TreeNet improves over conventional boosting in that:
- It is relatively impervious to errors in the target, such as mislabeling
- It is strongly resistant to overfitting
- It generalized well to future data
|
| 10:00am – 10:15am | Break |
| 10:15am – 11: 15am | TreeNet in Action
Explore SPM's unique modeling automation capabilities while running multiple data sets on both GUI and Non-GUI interfaces, and the advantages and disadvantages to both. We'll introduce the CART component of SPM, and explain:
- Building models in SPM
-
Setting control parameters
-
Interpreting output
-
Variable importance
-
Introduction to battery shaving in SPM using TreeNet
|
| 11:15am – 11:30am | Break |
| 11:30am – 12:30pm | Interpreting TreeNet Models and Interaction Detection
Interaction detection is the detection and reporting component of TreeNet using Interaction Control Language (ICL).
You will understand:
- How to shape the structure of interactions
- How to impose different interactions in TreeNet models
-
Dependency plots and how they are utilized
|
| 12:30pm – 1:30pm | Lunch |
| 1: 30pm – 2:30pm | Modern Approaches To Regularized Regression
Generalized Path Seeker (GPS) is the most recent advance in regularized regression. This technology offers high-speed LASSO-style regression for extreme data set configurations with upwards of 100,000 predictors and possibly very few rows. Such data sets are commonplace in gene research and it is both supremely fast and efficient.
- Application using examples in data sets
-
How GPS is implemented in SPM
-
Command line operation of SPM
-
Comments on parallel processing
|
| 2:30pm – 2:45pm | Break |
| 2:45pm – 3:45pm | Linking Engines
Explore how to:
- Combine TreeNet’s power of transformation and variable selection with GPS
- Identify the most influential trees in GPS with ISLE (Importance Samples Learning Ensembles)
- Use Rulefit to identify the most influential nodes and rules in a TreeNet model
|
| 3:45pm – 4pm | Break |
| 4pm - 5pm | Loose Ends and Application
Q&A with the experts for further discussion and apply SPM to your own data sets. |
|