Salford Systems logo white space
Navigation
white space
white space
white space
white space
white space
Products > CART > Technical Overview > Frequently Asked Questions > Limitations on the Learn Sample Size When Using Cross Validation
Limitations on the Learn Sample Size When Using Cross Validation


By default CART will not allow cross validation (CV) for any dataset that has more than 3000 observations. The n-fold cross-validation technique is designed to get the most out of datasets that are too small to accommodate a hold-out or test sample. Once you have 3,000 records or more, we recommend that a separate test set be used.

For large datasets, it is recommended that a separate error set be used, either by manually splitting the dataset into learn and test samples (ERROR TEST or ERROR SEPVAR) or by using a randomly-selected test set (ERROR PROPORTION).

However, you can persist in using CV with the command:

BOPTIONS CVLEARN = n

The default value for n is 3000 but it can be reset to a larger value. For example, if you have 50,000 observations and want to use the entire dataset in a cross-validation run, issue the command:

BOPTIONS CVLEARN = 50000


Steinberg, Dan and Phillip Colla. CART--Classification and Regression Trees. San Diego, CA: Salford Systems, 1997.
white space
© Copyright 2003-2004 Salford Systems - Print this page white space