Did you know you can easily build a family of CART models with the BATTERY feature? It's true! BATTERY is one of the most powerful aspects of the Salford Predictive Modeling Suite (SPM). For instance, suppose you wish to consider how the size of your CART tree affects the tree's predictive accuracy. You might build a series of individual trees yourself, or you can let BATTERY do it for you. Four batteries -- ATOM, MINCHILD, DEPTH and NODES -- work in similar ways by varying the allowable size of the atom, minchild, tree depth and the number of nodes permitted in the maximal tree. These controls constrain how large your CART tree is permitted to grow. Because they are tree-oriented controls, they work with TreeNet and RandomForests models too. For example, by issuing just the following simple series of commands you will find yourself with eight CART trees, which you can easily compare against one another to find a tradeoff between predictive accuracy and tree complexity that works best for you:
The commands above, using BATTERY MINCHILD, will vary the "minchild parameter" in your models. This is a constraint on the minimum child node allowed in the tree: no split is permitted that produces a child node smaller than the minchild. BATTERY ATOM works in a similar way, except that it controls the atom size: a node smaller than the atom will not be split at all. BATTERY NODES varies the number of nodes permitted in the maximal tree, while BATTERY DEPTH varies the maximum depth permitted for the tree. Note that all four of these batteries can be combined, to produce a series of 28 models. The commands:
produce the following:
These batteries also work well with TreeNet and RandomForests models. For instance, you may wish to consider how the number of nodes affects the performance of your TreeNet model. Suppose you wish to try five tree sizes in your TreeNet modeling:
BATTERY NODES VALUES=2,3,4,5,7
MART TREES=200, GO
The first model will build a TreeNet model consisting of trees having one split only (structurally precluding any interactions), while the remaining models will allow successively more interactions to occur because each tree can contain several splits. In this particular example, cross entropy (CXE) and classification error improve as the number of nodes permitted in the trees increases, but ROC and lift are relatively unaffected.
SPM has over 50 different BATTERY options. We will describe some of these options others in the coming weeks. These commands will generate a series of eight models, presented below in a brief summary table that shows the accuracy of each model. Note that because the same learn/test sample split is used in all eight models, an honest comparison of their predictive accuracies can be made. Each model can be explored in detail by clicking on its line in the summary report, which will bring up a navigator with full tree detail. Two or more navigators can be viewed on screen at once.