Differences Between Train and Test Performance Results
In their 1984 monograph, Classification and Regression Trees, Breiman, Friedman, Olshen and Stone discussed at length the need to obtain "honest" estimates of the predictive accuracy of a tree–based model. At the time the monograph was written, many data sets were small, so the authors took great pains to work out an effective way to use cross–validation with CART trees. The result was a major advance for data mining, introducing ideas that at the time were radically new. The main point of the discussion was that the only way to avoid overfitting was to rely on test data. With plentiful data we can always reserve a portion for testing, but with fewer data we might have to rely on cross validation. In either case, however, only the test or cross–validated results should be trusted. In contrast, earlier approaches tended to ignore the training data performance results and focus only on the test data.
Later, and especially once TreeNet and other ensembles became available, practitioners observed that we sometimes encountered rather close agreement between train and test results. This was especially true for methods such as TreeNet, which were constructed to resist overfitting. The question then became whether we should use a possible divergence between train and test data as itself an indication of a problem with the model.
Our practical approach in model development and selection is to prefer TreeNet models that show good agreement between train and test results and to distrust models exhibiting substantial train/test disagreement. We do not offer a formal statistical test of this difference, relying instead on judgment. In practice, when the train/test divergence seems too large to ignore we attempt to refine our models in one or more of the following ways:
- Using a slower learn rate,
- Growing smaller trees, or
- Removing some potentially strong predictors from the model.
Any and all of these modifications can lead to improved train/test alignment.
We offer these answers to the following specific questions:
Are large differences between train and test results common? Is this in and of itself a problem?
—Large divergences between train and test performance in TreeNet models are not an everyday occurrence, but are not rare either.
—Large train/test performance differences in TreeNet models are not necessarily a problem, but we take them as indications that the models are probably sub–optimal and can be improved by appropriate manipulation of the Treenet control parameters and the predictors used.