Classification and Regression Trees
CART 6.0 ProEX Features
CART 6.0 ProEX, released in 2008, comes with a huge list of new features that will help analysts work more rapidly and guide their models to the best-performing trees. This is a dramatic upgrade of our flagship product and is drawing rave reviews from our customers. All of the new CART 6.0 ProEX features are explained in detail in our feature matrix (PDF) some highlights are listed below:
- Force splitters into nodes
- Confine select splitters to specific regions of a tree (Structured Tree™)
- Search data for ultra-high performance segments.
- HotspotDetector trees are specifically designed to yield extraordinarily high-lift or high-risk nodes. The process focuses on individual nodes and generally discards the remainder of the tree.
Train/Test Consistency Assessment
- Node-by-node summaries of agreement between train and test data on both class assignment and rank ordering of the nodes.
- Quickly identifies ideally-performing robust trees.
- Automatically generates entire collections of trees exploring different control parameters.
- Nineteen automated batteries cover exploration of multiple splitting rules, five alternative missing value handling strategies, random selection of alternative predictor lists, progressively smaller (or larger) training sample sizes, and much more.
- Includes stepwise backwards predictor elimination using any of three predictor ranking criteria (lowest variable importance rank, lowest loss of area under the ROC curve, highest variable importance rank).
Model Assessment via Monte Carlo Testing
- Measures possible overfitting with automated Monte Carlo randomization tests.
- New tools for automatic construction of new features (as linear combinations of predictors).
- Identification of multiple lists of candidates allows precise control over which predictors may be combined into a single new feature.
Unsupervised Learning Mode
- Uses Breiman's column scrambler to automatically detect potential clusters with no need to scale data, address missing values, or select variables for clustering.
CART Classification and Regression Trees are one of Salford Systems flagship products, and is a core component within the Salford Predictive Modeler software suite.
CART Features Matrix
CART Price Quote
CART Supported File Types
The CART® data-translation engine supports data conversions for more than 80 file formats, including popular statistical-analysis packages such as SAS® and SPSS®, databases such as Oracle and Informix, and spreadsheets such as Microsoft Excel and Lotus 1-2-3.
CART System Requirements - short introduction
CART University Program
How to I define penalties to make it harder for a predictor to become the primary splitter in the node?
Any CART model can be easily deployed when translated into one of the supported languages (SAS®-compatible, C, Java, and PMML) or into the classic text output. This is critical for using your CART trees in large scale production work.
The decision logic of a CART tree, including the surrogate rules utilized if primary splitting values are missing, is automatically implemented. The resulting source code can be dropped into external applications, thus eliminating errors due to hand coding of decision rules and enabling fast and accurate model deployment.
Cross-validation is a method for estimating what the error rate of a sub-tree (of the maximal tree) would be if you had test data. Regardless of what value you set for V-fold cross validation, CART grows the same maximal tree. The monograph provides evidence that using a V of 10-20 gives better results than using a smaller number, but each number could result in a slightly different error estimate. The optimal tree — which is derived from the maximal tree by pruning — could differ from one V to another because each cross-validation run will come up with slightly different estimates of the error rates of sub-trees and thus might differ in which tree was actually best.