Salford Systems logo white space
Navigation
white space
white space
white space
white space
white space
Support > Frequently Asked Questions > CART
Limitations on the Learn Sample Size When Using Cross Validation


By default CART will not allow Cross Validation (CV) for any dataset that has more than 3000 observations. The n-fold cross-validation technique is designed to get the most out of datasets that are too small to accommodate a hold-out or test sample. Once you have 3,000 records or more, we recommend that a separate test set be used.

For large datasets, it is recommended that a separate error set be used, either by manually splitting the dataset into learn and test samples (ERROR TEST or ERROR SEPVAR) or by using a randomly-selected test set (ERROR PROPORTION).

However, you can persist in using CV with the command:

BOPTIONS CVLEARN = n

The default value for n is 3000 but it can be reset to a larger value. For example, Iif you have 50,000 observations and want to use the entire dataset in a cross-validation run, issue the command:

BOPTIONS CVLEARN = 50000


Steinberg, Dan and Phillip Colla. CART--Classification and Regression Trees. San Diego, CA: Salford Systems, 1997.
white space
© Copyright 2003-2004 Salford Systems - Print this page white space