Download Now! Free 30 Day Trial of Salford System's Predictive Modeling Suite

Upcoming Tradeshows

  • JSM
    July 28, 2012 - August 02, 2012
    San Diego, CA, Booth TBA
  • KDD
    August 12, 2012 - August 16, 2012
    Beijing, China, Booth TBA
  • Statistical Learning and Data Mining III
    October 01, 2012
    Boston, MA
  • DMA
    October 13, 2012 - October 19, 2012
    Las Vegas, NV
  • INFORMS
    October 14, 2012 - October 16, 2012
    Phoenix, AZ
View full calendar
Frequently Asked Questions for CART

Frequently Asked Questions for CART

CART® is the ultimate classification tree that has revolutionized the entire field of advanced analytics and inaugurated the current era of data mining. CART, which is continually being improved, is the most important tool in modern data mining methods. Designed for both non-technical and technical users, CART can quickly reveal important data relationships that could remain hidden using other analytical tools.

CART is based on landmark mathematical theory introduced in 1984 by four world–renowned statisticians at Stanford University and the University of California at Berkeley. Salford Systems’ implementation of CART is the only decision tree software embodying the original proprietary code. The CART creators continue to collaborate with Salford Systems to enhance CART with proprietary advances.

CART's tree-building process can be interactively controlled by using the INTERACTIVE option on the ERROR command:

ERROR EXPLORATORY / INTERACTIVE

When building is started, CART will produce and display information on the first node, including lists of Competitor and Surrogate splits. You will be prompted with:

>

The commands that can be used during interactive splitting are:

  • REPLACE COMPETITOR=n: Choose to split on competitor n from the list.
  • REPLACE SURROGATE=n: Choose to split on surrogate n from the list.
  • REPLACE LINEAR=n: Choose to split on linear combination n from the list.
  • REPLACE SPLIT var=n: Choose to split on a specific variable var at split value n.
  • REPLACE SPLIT var=n1,n2,n3,… : Choose to split on a specific categorical variable var with split levels n1, n2, n3,&hellip
  • NEXT: move on to the next node.
  • CONTINUE: stop interactive splitting and let the rest of the tree grow automatically.
  • QUIT: Totally stop altogether.
  • FRESH: Redo the current node unconstrained, as it was when you first saw it.
  • ABOVE DEPTH = n: Allows interactive splitting only above a given depth.
  • RESAMPLE: If the node is subsampled, this will choose a new subsample and generate new splits.

Several important caveats about interactive splitting:
  • You must be in command mode to use this feature. Future versions of CART will enable this feature via the GUI.
  • Because the interactively split tree is an exploratory tree, it will not be pruned back. To avoid growing a tree that is too large, be sure to limit the size of the tree by setting the complexity, depth, or number of nodes prior to building the tree.
  • If you only want to interactively split the top three nodes, use ABOVE DEPTH=2 to avoid having to interactively split other nodes on the left side of the tree before returning to the second node on the right side of the tree.

Steinberg, Dan and Phillip Colla. CART—Classification and Regression Trees. San Diego, CA: Salford Systems, 1997.

Rarely, computational instability or a precision problem will result in an message stating that an “internal error” has been encountered. To help Salford Systems resolve the problem, please attempt the following:

  • Rebuild as an exploratory tree to see if CART completes the building process.
  • Rebuild the tree with a randomly-selected subset for testing (ERROR p=.2, for example).
  • Rebuild a cross-validation tree with an explicit SEED to perturb the selection of cross-validation subsets.

Please send Salford Systems via email (or diskette):
  • command files or commands used prior to the build;
  • output generated during the run;
  • frequency table of your target variable;
  • frequency table of your categorical predictor variables; and
  • if possible, your data set, or any subset that can reproduce the problem.
<< Start < 1 3 Next > End >>