By Phone or Online

Access the help you need to use our software from representatives who are knowledgeable in data mining and predictive analytics

  • Banner 201707

    By Phone or Online

    Access the help you need to use our software from representatives who are knwoledgeable in data mining and predictive analytics

Download Now Instant Evaluation
Get Price Quote

Are there limitations on the learn sample size when using cross validation?

By default CART will not allow Cross Validation (CV) for any dataset that has more than 3000 observations. The n-fold cross-validation technique is designed to get the most out of datasets that are too small to accommodate a hold-out or test sample. Once you have 3,000 records or more, we recommend that a separate test set be used.
For large datasets, it is recommended that a separate error set be used, either by manually splitting the dataset into learn and test samples (ERROR TEST or ERROR SEPVAR) or by using a randomly-selected test set (ERROR PROPORTION).
However, you can persist in using CV with the command:
BOPTIONS CVLEARN = n
The default value for n is 3000 but it can be reset to a larger value. For example, if you have 50,000 observations and want to use the entire dataset in a cross-validation run, issue the command:
BOPTIONS CVLEARN = 50000
Steinberg, Dan and Phillip Colla. CART—Classification and Regression Trees. San Diego, CA: Salford Systems, 1997.

[J#368:1602]

Get In Touch With Us

Contact Us

9685 Via Excelencia, Suite 208, San Diego, CA 92126
Ph: 619-543-8880
Fax: 619-543-8888
info (at) salford-systems (dot) com