By Phone or Online

Access the help you need to use our software from representatives who are knowledgeable in data mining and predictive analytics

  • Banner 201707

    By Phone or Online

    Access the help you need to use our software from representatives who are knwoledgeable in data mining and predictive analytics

Download Now Instant Evaluation
Get Price Quote

What if there are too many levels in a categorical predictor?

CART will only search over all possible subsets of a categorical predictor for a limited number of levels. Beyond a threshold set by computational feasibility, CART will simply reject the problem. You can control this limit with the BOPTION NCLASSES = m command, but be aware that for m larger than 15, computation times increase dramatically.

SOLUTION: Convert The Variable Into Dummies

The ideal solution is to work with a supercomputer implementation of Salford Systems CART, because this will provide the optimal tree. Other alternatives are compromises that might not yield satisfactory results. One such compromise is to break the categorical variable into a vector of dummies. For example, a 50-level occupation variable could be coded into 50 separate indicators.
Steinberg, Dan and Phillip Colla. CART—Classification and Regression Trees. San Diego, CA: Salford Systems, 1997.

[J#380:1602]

Tags: Frequently Asked Questions, FAQs, CART, Support, Salford-Systems

Get In Touch With Us

Contact Us

9685 Via Excelencia, Suite 208, San Diego, CA 92126
Ph: 619-543-8880
Fax: 619-543-8888
info (at) salford-systems (dot) com