Feature Selection for CART® using the Salford Predictive Modeler® SoftwareSuite
We can dig deeper than we did in our previous post into the reasons why more compact predictor lists can improve decision trees. Recall that a CART tree is grown by searching for splits across all predictors and all possible split points in a given partition of the learning data. There is no guarantee that this same split will be as good on the previously-unseen test data. Occasionally, the best split on the learn data will be a lucky draw, and the split will not be confirmed on test data. In the original CART monograph, large sample theory was intended to assure that in very large samples CART will always correct any unfortunate splits made as the tree evolves by making the correct splits lower down in the tree. With sufficiently large samples, enough data always are left to converge to the best model. In most real world situations, however, we will not want to rely on massive data sets to get to the best model, and we may not have enough data to assure the desired result.
In the original BFOS CART monograph the authors used judgment to select the predictors they thought would be most useful and acceptable to medical professionals. In many real world contexts, the exercise of judgment is both defensible and useful. However, instead of applying this judgment in advance of any analysis, we can start by building trees automatically and then using judgment in a review of the tree details.