Frequently Asked Questions for TreeNet
TreeNet® is Salford's most flexible and powerful data mining tool, responsible for at least a dozen prizes in major data mining competitions since its introduction in 2002. The algorithm typically generates thousands of small decision trees built in a sequential error–correcting process to converge to an accurate model.
TreeNet models are usually complex and thus the software generates a number of special reports designed to extract the meaning of the model. Graphs produced by TreeNet software display the impact of any relevant predictor or pair of predictors on the target, thus revealing the underlying data structure.
TreeNet's robustness extends to data contaminated with erroneous target labels. This type of data error can be very challenging for conventional data mining methods and will be catastrophic for conventional boosting. In contrast, TreeNet is generally immune to such errors as it dynamically rejects training data points too much at variance with the existing model. In addition, TreeNet adds the advantage of a degree of accuracy usually not attainable by a single model or by ensembles such as bagging or conventional boosting. As opposed to neural networks, TreeNet is not sensitive to data errors and needs no time–consuming data preparation, preprocessing or imputation of missing values.
A TreeNet® model normally consists of from several dozen to several hundred small trees, each typically no larger than two to eight terminal nodes. The model is similar in spirit to a long series expansion (such as a Fourier or Taylor's series) - a sum of factors that becomes progressively more accurate as the expansion continues. The expansion can be written as:
Contact Salford Systems at 619.543.8880 or e-mail support (at) salford-systems (dot) com. We maintain a collection of white papers and academic studies on various data mining topics on the web site and offer tutorials on TreeNet®, CART®, and MARS® in major cities world wide. Internet meetings to demonstrate and discuss any of our products can be arranged.
The Salford Systems data mining solution rests on two groups of technologies: CART, MARS, and PRIM for accurate, easy-to-understand models, and TreeNet® and RandomForests® for ultra-high performance, potentially complex models interpreted via supporting graphical displays. Even in circumstances where interpretability and transparency are mandatory and a model must be expressed in the form of rules, TreeNet can serve a useful function by benchmarking the maximum achievable accuracy against which interpretable models can be compared.
TreeNet® uses gradient boosting to achieve the benefit of boosting (accuracy) without the drawback of a tendency to be misled by bad data. In boosting, each tree grown would normally be a fully articulated stand alone model, with each boosted tree combined with its mates via a weighted voting scheme. In contrast, each TreeNet® component is a small tree, often no larger than two terminal nodes; trees are summed together with very small weights on each component.
TreeNet® is designed for very high accuracy predictive modeling. Because TreeNet® attempts to achieve this goal even if very complex models are required, models may be relatively difficult to understand in detail. However, the graphs produced by TreeNet® software display the impact of any relevant predictor or pair of predictors on the target, thus revealing the underlying data structure.
TreeNet® requires that both training and test data reside in RAM. Thus, if large databases are being analyzed, TreeNet® will be most effective when running on large-capacity servers. We recommend a minimum of 512 MB RAM and on Windows machines, Windows XP or later versions of the OS are preferred platforms for performance. TreeNet® is available for Windows XP or later and UNIX (IBM AIX, Compaq Alpha, SGI, HP, and Sun) platforms and will run with as little as 64 MB RAM. A Linux version is planned.
TreeNet® was developed in 1997 by Stanford University's Jerome Friedman, one of the authors of CART®, the author of MARS®, and the inventor of Projection Pursuit and HotSpotDetector®. The TreeNet® technology has been tested in a broad range of industrial and research settings and has demonstrated considerable benefits. In tests in which TreeNet® was pitted against expert modeling teams using a variety of standard data mining tools, TreeNet® was able to deliver results within a few hours comparable to or better than results requiring months of hands-on development by expert data mining teams.
Yes. TreeNet® cannot accept more than one target variable at a time. To model a collection of targets a separate TreeNet® model must be developed for each target independently. Also, neural nets can simultaneously estimate a function and its derivatives whereas TreeNet® is not designed to estimate the target function derivatives.