# Does CART allow multiple targets?

There are two ways to interpret your question:

Does CART® allow multi–class targets (eg, a class label with values 1,2,3,...etc)

CART has been used in real world classification problems with more than 400 classes.

In one project our goal was to predict which specific model of new car a given person actually bought. In the project there were more than 400 different car models available and the predictors were drawn from a lengthy set of attitude and interest questions.

For such models to be useful you need to have a decent sample size for each level of the target. In the car purchase study some models had been bought by more than 2000 people (a good sample size) while some exotic and expensive cars had been bought by fewer than 10 people (the total sample size was over 50,000 records). Naturally, we could not place much faith in the predictions concerning the least frequently bought cars. However, overall, the models built were both quite accurate and generated considerable insight into the factors influencing consumer choice in car purchases.

Does CART allow you to build a single model that will simultaneously predict more than one different target?

This topic has been on our radar screen for decades! Leo Breiman sent us a crude version of a CART tree that accomplished this in 1992 and we spoke with him at length on how to modify CART to accomplish this around that time. Later, in 1998, we worked with a client who also had developed their own "vector" tree for their consulting practice in the 1970s! To date however, we do have a simple way to do this in the shipping version of CART. But stay tuned as we expect to have some further news on this topic in 2012.

BTW, a final note: If you only have a few low level targets (eg 2 binary targets) you can always convert the set of targets into one multi–class categorical, as follows:

target_1 target_2 new_composite_target

0 0 0

0 1 1

1 0 2

1 1 3

If you build a single CART tree using the new composite target for modeling you will develop a single tree predicting both target_1 and target_2 simultaneously. If your sample size is large enough you can create composites of many levels effectively.

In this type of model we would recommend that you use the AUXILIARY command as in:

AUXILIARY target_1 target_2 etc

which will allow you to easily see the data distribution for each original target in every node (including contrasting train and test results).

[J#69:1603]

Tags: CART, Blog, Target Variables