For beginners and expert users

  • General introductory videos to SPM's data mining.

  • Comprehensive training videos

  • Webinars & Tutorials: Tips & Tricks and industry specific insights

 
  • Banner 201707

    For beginners and expert users

    • General introductory videos to SPM's data mining.

    • Comprehensive training videos

    • Webinars & Tutorials: Tips & Tricks and industry specific insights

Talk to Minitab
Get Price Quote

What if there are too many levels in a categorical predictor?

CART will only search over all possible subsets of a categorical predictor for a limited number of levels. Beyond a threshold set by computational feasibility, CART will simply reject the problem. You can control this limit with the BOPTION NCLASSES = m command, but be aware that for m larger than 15, computation times increase dramatically.

SOLUTION: Convert The Variable Into Dummies

The ideal solution is to work with a supercomputer implementation of Salford System's CART, because this will provide the optimal tree. Other alternatives are compromises that might not yield satisfactory results. One such compromise is to break the categorical variable into a vector of dummies. For example, a 50-level occupation variable could be coded into 50 separate indicators.
Steinberg, Dan and Phillip Colla. CART—Classification and Regression Trees. San Diego, CA: Salford Systems, 1997.

[J#380:1602]

Tags: Frequently Asked Questions, FAQs, CART, Support, Salford-Systems

Get In Touch With Us

Request online support

Ph: 619-543-8880
9685 Via Excelencia, Suite 208, San Diego, CA 92126