Download Now! Free 30 Day Trial of Salford System's Predictive Modeling Suite

Upcoming Tradeshows

  • JSM
    July 28, 2012 - August 02, 2012
    San Diego, CA, Booth TBA
  • KDD
    August 12, 2012 - August 16, 2012
    Beijing, China, Booth TBA
  • Statistical Learning and Data Mining III
    October 01, 2012
    Boston, MA
  • DMA
    October 13, 2012 - October 19, 2012
    Las Vegas, NV
  • INFORMS
    October 14, 2012 - October 16, 2012
    Phoenix, AZ
View full calendar
Home Blog Dan Steinberg Dan Steinberg Feature Selection for CART® using the Salford Predictive Modeler™ Suite

Feature Selection for CART® using the Salford Predictive Modeler™ Suite

Written by  Dan Steinberg Tuesday, September 20 2011
Rate this item
(0 votes)

We can dig deeper than we did in our previous post into the reasons why more compact predictor lists can improve decision trees. Recall that a CART tree is grown by searching for splits across all predictors and all possible split points in a given partition of the learning data. There is no guarantee that this same split will be as good on the previously-unseen test data. Occasionally, the best split on the learn data will be a lucky draw, and the split will not be confirmed on test data. In the original CART monograph, large sample theory was intended to assure that in very large samples CART will always correct any unfortunate splits made as the tree evolves by making the correct splits lower down in the tree. With sufficiently large samples, enough data always are left to converge to the best model. In most real world situations, however, we will not want to rely on massive data sets to get to the best model, and we may not have enough data to assure the desired result.

In the original BFOS CART monograph the authors used judgment to select the predictors they thought would be most useful and acceptable to medical professionals. In many real world contexts, the exercise of judgment is both defensible and useful. However, instead of applying this judgment in advance of any analysis, we can start by building trees automatically and then using judgment in a review of the tree details.

Dan Steinberg

Dan Steinberg

Dan Steinberg, President and Founder of Salford Systems, is a well-respected member of the statistics and econometrics communities. In 1992, he developed the first PC-based implementation of the original CART procedure, working in concert with Leo Breiman, Richard Olshen, Charles Stone and Jerome Friedman. In addition, he has provided consulting services on a number of biomedical and market research projects, which have sparked further innovations in the CART program and methodology.

Dr. Steinberg received his Ph.D. in Economics from Harvard University, and has given full day presentations on data mining for the American Marketing Association, the Direct Marketing Association and the American Statistical Association. A book he co-authored on Classification and Regression Trees was awarded the 1999 Nikkei Quality Control Literature Prize in Japan for excellence in statistical literature promoting the improvement of industrial quality control and management.