What are CART's "automatic self-validation procedures"?
CART uses two test procedures to select the "optimal" tree, which is the tree with the lowest overall misclassification cost, thus the highest accuracy. Both test disciplines, one for small datasets and one for large, are entirely automated, ensuring that the optimal tree model will accurately classify existing data and predict results.
For smaller datasets and cases when an analyst does not wish to set aside a portion of the data for test purposes, CART automatically employs cross validation. While this frequently occurs in medical research, a shortage of training data can occur in the study of any rare event, such as specific types of fraud. In cross validation, ten different trees are typically grown, each built from a different ten percent of the total sample. When the results of the ten trees are put together, a highly reliable determination of the optimal tree size is obtained. For large datasets, CART automatically selects test data or uses pre-defined test records or test files to self-validate results.