Why is CART unique among decision-tree tools?
CART is based on a decade of research, assuring stable performance and reliable results. CART's proven methodology is characterized by:
- Reliable pruning strategy – CART's developers determined definitively that no stopping rule could be relied on to discover the optimal tree, so they introduced the notion of over-growing trees and then pruning back; this idea, fundamental to CART, ensures that important structure is not overlooked by stopping too soon. Other decision-tree techniques use problematic stopping rules.
- Powerful binary-split search approach – CART's binary decision trees are more sparing with data and detect more structure before too little data are left for learning. Other decision-tree approaches use multi-way splits that fragment the data rapidly, making it difficult to detect rules that require broad ranges of data to discover.
- Automatic self-validation procedures – In the search for patterns in databases it is essential to avoid the trap of "overfitting," or finding patterns that apply only to the training data. CART's embedded test disciplines ensure that the patterns found will hold up when applied to new data. Further, the testing and selection of the optimal tree are an integral part of the CART algorithm. Testing in other decision-tree techniques is conducted after the fact and tree selection is left up to the user.
- In addition, CART accommodates many different types of real-world modeling problems by providing a unique combination of automated solutions:
1. surrogate splitters intelligently handle missing values;
2. adjustable misclassification penalties help avoid the most costly errors;
3. multiple-tree, committee-of-expert methods increase the precision of results; and
4. alternative splitting criteria make progress when other criteria fail.