R-Squared for CART Regression Trees
CART users often ask where they can find the value of the R-Squared for their regression trees. The answer is simple. In conventional statistics,
R-Squared = 1 - SSE/SST, (1)
where SSE is the sum of squared errors of the actual data around the model predictions, and SST, the total sum of squares, is the sum of squared deviations of the dependent variable around its mean. In traditional statistics R-Squared is always calculated using the training data (LEARN SET). CART users can read the R-Squared directly from the output:
R-Squared = 1 - CART_Relative_Error (2)
CART_Relative_Error = SSE/SST, (3)
where SSE is the sum of squared errors of the CART model and SST is the sum of squared errors of the dependent error around its mean in the root node. In other words, the relative error for the training data in CART is calculated exactly as 1 - R-Squared. In the CART regression tree below we display performance results for training data for the BOSTON.CSV data set. The relative error of 0.076 is literally equivalent to an R-Squared of 0.924 on the training data. You can always find the training data performance in the classic output and it is this that you should report to readers wanting the conventional R-Squared.
Figure 1. In the screen capture above we show a 27-terminal-node CART regression tree for the BOSTON.CSV data with performance measured on training data. The CART tree was run without a testing method. (We requested an "exploratory tree" on the TEST tab of the model setup dialog.) The relative error reported for an exploratory tree is mathematically identical to the statistician's 1-R-Squared.