104Frequently Asked Questions for CART

CART® is the ultimate classification tree that has revolutionized the entire field of advanced analytics and inaugurated the current era of data mining. CART, which is continually being improved, is the most important tool in modern data mining methods. Designed for both non-technical and technical users, CART can quickly reveal important data relationships that could remain hidden using other analytical tools.

CART is based on landmark mathematical theory introduced in 1984 by four world–renowned statisticians at Stanford University and the University of California at Berkeley. Salford Systems' implementation of CART is the only decision tree software embodying the original proprietary code. The CART creators continue to collaborate with Salford Systems to enhance CART with proprietary advances.

What is the systat dataset format?

CART and MARS continue to read data stored in the legacy SYSTAT format, a binary (i.e., not human-readable) format widely used by statisticians and researchers using the SYSTAT statistical programs. Relative to comma-separated-text and some other binary formats, the legacy SYSTAT format is quite restrictive (limited variable name lengths, limited lengths of character data). We do not recommend that you use it. However, for our clients that do need to work with this format, we provide the following C and Fortran programs that illustrate how legacy SYSTAT datasets are structured. Originally, legacy SYSTAT format was written and read with Fortran code. Thus, because the format must accommodate the record segmentation and padding typical of Fortran I/O, the C version handles these issues explicitly.

What if there are too many levels in a categorical predictor?

CART will only search over all possible subsets of a categorical predictor for a limited number of levels. Beyond a threshold set by computational feasibility, CART will simply reject the problem. You can control this limit with the BOPTION NCLASSES = m command, but be aware that for m larger than 15, computation times increase dramatically.

What makes Salford Systems' CART the only "true" CART?

Salford Systems' CART is the only decision tree based on the original code of Breiman, Friedman, Olshen, and Stone. Because the code is proprietary, CART is the only true implementation of this classification-and-regression-tree methodology. In addition, the procedure has been substantially enhanced with new features and capabilities in exclusive collaboration with CART's creators. While some other decision-tree products claim to implement selected features of this technology, they are unable to reproduce genuine CART trees and lack key performance and accuracy components. Further, CART's creators continue to collaborate with Salford Systems to refine CART and to develop the next generation of data-mining tools.

Can We Obtain Dependency Plots for Single CART Trees?

The short answer is YES such plots can be generated. Historically, we concluded that such graphs would normally not be that interesting as they would frequently be single step functions reflecting the fact that individual variables often appear only once or twice in a tree. Also, such graphs would not properly reflect the effect of a varible across most of its range of values. Thus, as of SPM 7.0 CART does not offer such plots. However, we can see what such plots would look like by using TreeNet to grow a one-tree model. To do this, just set up a normal model, choose the TreeNet analysis method, and set the number of trees to be grown to 1 (see green arrow below).

What is CART?

CART is an acronym for Classification and Regression Trees, a decision-tree procedure introduced in 1984 by world-renowned UC Berkeley and Stanford statisticians, Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone. Their landmark work created the modern field of sophisticated, mathematically- and theoretically-founded decision trees. The CART methodology solves a number of performance, accuracy, and operational problems that still plague many other current decision-tree methods. CART's innovations include:

