Why does the tree change when non-splitting variables are dropped?

If a variable does not enter the tree as a primary node splitter, it may still play a important role in the tree as a surrogate splitter. If you have turned the displaying of surrogate splitters off, you will not see how these variables affect the tree but they will still be used internally by CART when applying the tree to data. The Variable Importance Table produced by CART ranks the variables in the tree by their importance, a statistic measuring how strongly a variable acts as a primary or surrogate splitter.
Suppose a variable enters the tree as the top surrogate splitter in many nodes, but never as the primary splitter. If this variable is removed from the list of potential predictor variables and the tree is rebuilt, it will probably be a very different tree, and certainly will be if there are missing values in the data for the primary node-splitting variables.
Steinberg, Dan and Colla, Phillip. CART—Classification and Regression Trees. San Diego, CA: Salford Systems, 1997.
Another possibility is due to the way CART grows trees. Normally, CART first grows a maximal tree and then tests it either through cross validation or a separate test sample. If a split does not hold up to testing, it is removed from the model. Thus, if a model splits one or more times on a particular variable, but none of these splits hold up to testing, the variable will not appear as a primary splitter in the final model. However, if the variable is dropped, the splits involving that variable in the maximal tree might be replaced by others, which may appear in the final tree.


