The underlying notion driving combining trees is that if a problem is tackled in slightly different ways and then the results averaged, more accurate and stable models may be obtainable. Research on combining trees in a single analysis has demonstrated that there are potentially very substantial gains from combining trees. The reduction in error rates for well-known data sets typically ranges from 5 to 40% for classification trees and 5 to 50% for regression trees; albeit, it is important to realize that improvements are not guaranteed and for some problems combining trees can yield less satisfactory results. Further, when you combine trees, the classifier or predictor generated is no longer a visually appealing and comprehensible tree. Instead, the result is a committee of experts (which is somewhat akin to a neural net's black box); unlike the simple CART tree, you will not have a simple story to relay if your final model is the result of a complex interaction between many expert trees.
Two different Committee of Experts (or resampling technologies) are currently available within CART for UNIX platforms:
- bootstrap aggregation (or bagging) in which each new resample is drawn in the identical way, and
- ARCing (Adaptive Resampling and Combining) or ADAPTIVE resampling in which the way a sample is drawn for the next tree depends on the performance of prior trees.
Bootstrap Aggregation (a.k.a. bagging)
Bootstrap resampling was originally developed to help analysts determine how much their results might have changed if another random sample had been used instead, or how different results might be when a model is applied to new data. (The theory of the bootstrap was developed by Stanford's Brad Efron in 1979 and has been studied extensively since then).
For data mining applications, Leo Breiman applied the bootstrap in a novel way: the bootstrap is used to generate many versions of the data set or ?replications?, a separate analysis is conducted for each replication, and then the results are averaged. If the separate analyses differ considerably from each other (suggesting tree instability), the averaging will stabilize the results and yield much more accurate predictions. If the separate analyses are very similar to each other, the trees exhibit stability and the averaging will not harm or improve the predictions. Thus, the more unstable the trees, the greater the benefits of averaging.
ARCing
Freund and Schapire (1996) first introduced ARCing (a.k.a. boosting); Breiman (1996) introduced ARCing and demonstrated that it performs as well or better than boosting. In general, we recommend bagging rather than ARCing as bagging is more robust with dependent variable errors and is also much faster. Nevertheless, ARCing is capable of yielding some remarkable reduction in predictive error.
One final caution on combining via bagging or ARCing: the increase in accuracy is often accomplished for the class you have least interest in. For example, in a binary response model in which response is relatively rare, bagging and arcing may improve the non-response classification accuracy while slightly reducing the response classification accuracy relative to a standard CART tree. You will probably need to adjust priors to induce the most useful improvements.
For more information on implementing bagging and ARCing in CART, see FAQ-Combine.

