TreeNet® is Salford’s most flexible and powerful data mining tool, responsible for at least a dozen prizes in major data mining competitions since its introduction in 2002. The algorithm typically generates thousands of small decision trees built in a sequential error–correcting process to converge to an accurate model.
TreeNet models are usually complex and thus the software generates a number of special reports designed to extract the meaning of the model. Graphs produced by TreeNet software display the impact of any relevant predictor or pair of predictors on the target, thus revealing the underlying data structure.
TreeNet’s robustness extends to data contaminated with erroneous target labels. This type of data error can be very challenging for conventional data mining methods and will be catastrophic for conventional boosting. In contrast, TreeNet is generally immune to such errors as it dynamically rejects training data points too much at variance with the existing model. In addition, TreeNet adds the advantage of a degree of accuracy usually not attainable by a single model or by ensembles such as bagging or conventional boosting. As opposed to neural networks, TreeNet is not sensitive to data errors and needs no time–consuming data preparation, preprocessing or imputation of missing values.
TreeNet® is designed for very high accuracy predictive modeling. Because TreeNet® attempts to achieve this goal even if very complex models are required, models may be relatively difficult to understand in detail. However, the graphs produced by TreeNet® software display the impact of any relevant predictor or pair of predictors on the target, thus revealing the underlying data structure.
We see TreeNet® as a tool to be used after the data have been explored with tools such as CART® and MARS®. CART® and MARS® produce output that can clearly reveal data errors and inconsistencies, quickly leading to a detailed understanding of the data and potential problems. Once data quality has been assured and basic understanding of the key drivers in the data has been achieved, reanalyzing the data with TreeNet® is worthwhile. In most cases, TreeNet® will confirm the primary findings reported by CART® or MARS® while substantially increasing the predictive accuracy of the models.
A TreeNet® model normally consists of from several dozen to several hundred small trees, each typically no larger than two to eight terminal nodes. The model is similar in spirit to a long series expansion (such as a Fourier or Taylor's series) - a sum of factors that becomes progressively more accurate as the expansion continues. The expansion can be written as: