Training Videos

Introduction to CART:
Part 1  Part 2  Part 3  Part 4

Advanced CART:
Part 1  Part 2  Part 3  Part 4

MARS:
Part 1  Part 2 

RandomForests:
Part 1  Part 2

TreeNet:
Part 1  Part 2  Part 3  Part 4
Download Now! Free 30 Day Trial of Salford System's Predictive Modeling Suite

Upcoming Tradeshows

View full calendar
TreeNet/MART

Overview

TreeNet stochastic gradient boosting is Stanford University Professor Jerome Friedman's latest advance in data mining methodology. In TreeNet, classification and regression models are built up gradually through a potentially large collection of small trees, each of which improves on its predecessors through an error-correcting strategy. Although each tree may have only one split, the full model can be extraordinarily accurate. The final model takes the form of a series expansion in which every term is a (small) tree.

TreeNet improves over conventional boosting in that:

  • it is relatively impervious to errors in the target, such as mislabeling,
  • it is strongly resistant to overfitting, and
  • it generalizes well to future data.

Key innovations in stochastic gradient boosting include:

  • never using the entire training data in any one stage,
  • using a very slow learning rate, and
  • employing an acceptance/rejection sampling strategy to ignore problematic or useless training data.

Learning outcomes

Attendees will be introduced to the main concepts in boosting methods in data mining. They will also be presented with the core innovations behind TreeNet stochastic gradient boosting, including the concepts of slow learning, use of weak learners in every stage of model building, resampling from the training at every stage, and ignoring data considered too far from the decision boundary in classification problems.

Although the TreeNet model can be complex, this tutorial will show how new graphical tools assist in interpreting the results. The graphical tools demonstrated include 2-D and 3-D graphs that exhibit the dependence of the target on any individual or pair of predictors, and variable importance rankings that are available separately for each target class of the multi-class problem.

Content and instructional methods

Attendees will see examples of recent analysis of real world data. PowerPoint slides and live modeling runs will facilitate the learning process.

Course Outline:

  1. An Intuitive Introduction to TreeNet Stochastic Gradient Boosting
    1. Brief recap of modern non-parametric modeling, non-parametric local vs global parametric
    2. Brief review of decision tree fundamentals
    3. Boosting methods
  2. TreeNet Mathematical Basics
    1. Specification of the TreeNet model as a series expansion
    2. Fitting each stage to minimize residuals or deviance
    3. Least squares, LAD, and logistic likelihood objective functions
    4. Non-parametric approach to steepest descent optimization
  3. TreeNet at Work
    1. Stochastic boosting: resampling from training data
    2. The importance of the learn rate
    3. Sampling rate
    4. Forcing additive models
    5. Detecting interactions
    6. Reading the output: reports and diagnostics
    7. Using SGB as a variable selection method for other modeling techniques
  4. Comparing to AdaBoost and other methods