Download Now! Free 30 Day Trial of Salford System's Predictive Modeling Suite

Upcoming Tradeshows

  • JSM
    July 28, 2012 - August 02, 2012
    San Diego, CA, Booth TBA
  • KDD
    August 12, 2012 - August 16, 2012
    Beijing, China, Booth TBA
  • Statistical Learning and Data Mining III
    October 01, 2012
    Boston, MA
  • DMA
    October 13, 2012 - October 19, 2012
    Las Vegas, NV
  • INFORMS
    October 14, 2012 - October 16, 2012
    Phoenix, AZ
View full calendar
Home Blog Dan Steinberg Dan Steinberg Differences Between Train and Test Performance Results

Differences Between Train and Test Performance Results

Written by  Dan Steinberg Friday, January 06 2012
Rate this item
(1 Vote)

In their 1984 monograph, Classification and Regression Trees, Breiman, Friedman, Olshen and Stone discussed at length the need to obtain “honest” estimates of the predictive accuracy of a tree–based model. At the time the monograph was written, many data sets were small, so the authors took great pains to work out an effective way to use cross–validation with CART trees. The result was a major advance for data mining, introducing ideas that at the time were radically new. The main point of the discussion was that the only way to avoid overfitting was to rely on test data. With plentiful data we can always reserve a portion for testing, but with fewer data we might have to rely on cross validation. In either case, however, only the test or cross–validated results should be trusted. In contrast, earlier approaches tended to ignore the training data performance results and focus only on the test data.

Later, and especially once TreeNet and other ensembles became available, practitioners observed that we sometimes encountered rather close agreement between train and test results. This was especially true for methods such as TreeNet, which were constructed to resist overfitting. The question then became whether we should use a possible divergence between train and test data as itself an indication of a problem with the model.

Our practical approach in model development and selection is to prefer TreeNet models that show good agreement between train and test results and to distrust models exhibiting substantial train/test disagreement. We do not offer a formal statistical test of this difference, relying instead on judgment. In practice, when the train/test divergence seems too large to ignore we attempt to refine our models in one or more of the following ways:

  • Using a slower learn rate,
  • Growing smaller trees, or
  • Removing some potentially strong predictors from the model.
Any and all of these modifications can lead to improved train/test alignment.

We offer these answers to the following specific questions:

Are large differences between train and test results common? Is this in and of itself a problem?

—Large divergences between train and test performance in TreeNet models are not an everyday occurrence, but are not rare either.

—Large train/test performance differences in TreeNet models are not necessarily a problem, but we take them as indications that the models are probably sub–optimal and can be improved by appropriate manipulation of the Treenet control parameters and the predictors used.

Dan Steinberg

Dan Steinberg

Dan Steinberg, President and Founder of Salford Systems, is a well-respected member of the statistics and econometrics communities. In 1992, he developed the first PC-based implementation of the original CART procedure, working in concert with Leo Breiman, Richard Olshen, Charles Stone and Jerome Friedman. In addition, he has provided consulting services on a number of biomedical and market research projects, which have sparked further innovations in the CART program and methodology.

Dr. Steinberg received his Ph.D. in Economics from Harvard University, and has given full day presentations on data mining for the American Marketing Association, the Direct Marketing Association and the American Statistical Association. A book he co-authored on Classification and Regression Trees was awarded the 1999 Nikkei Quality Control Literature Prize in Japan for excellence in statistical literature promoting the improvement of industrial quality control and management.