TreeNet

TreeNet

TreeNet Introduction

Predictive Power:
TreeNet is Salford's most flexible and powerful data mining tool, capable of consistently generating extremely accurate models. TreeNet’s level of accuracy is usually not attainable by single models or by ensembles such as bagging or conventional boosting. TreeNet demonstrates remarkable performance for both regression and classification. The algorithm typically generates thousands of small decision trees built in a sequential error–correcting process to converge to an accurate model. Tree Net has been responsible for the majority of Salford’s modeling competition awards.
Supreme Accuracy:
TreeNet's robustness extends to data contaminated with erroneous target labels. This type of data error can be very challenging for conventional data mining methods and will be catastrophic for conventional boosting. In contrast, TreeNet is generally immune to such errors as it dynamically rejects training data points too much at variance with the existing model. In addition, TreeNet adds the advantage of a degree of accuracy usually not attainable by a single model or by ensembles such as bagging or conventional boosting. As opposed to neural networks, TreeNet is not sensitive to data errors and needs no time-consuming data preparation, preprocessing or imputation of missing values.
Advanced Features:
Interaction detection establishes whether interactions of any kind are needed in a predictive model, and is a search engine to discover specifically which interactions are required. The interaction detection system not only helps improve model performance (sometimes dramatically) but also assists in the discovery of valuable new segments and previously unrecognized patterns.

Technical Articles by Jerome Friedman are also available for download:

 

 

[K#557:1308]

Features

Additional TreeNet Features are available in Pro, ProEx, and Ultra.

ComponentsBasicProProExUltra
ComponentsBasicProProExUltra
Modeling Engine: TreeNet (Stochastic Gradient Boosting) o o o o
Spline-based approximations to the TreeNet dependency plots   o o o
Exporting TreeNet dependency plots into XML file   o o o
Automation:Build a series of models changing the minimum required size on child nodes (Battery MINCHILD)   o o o
Flexible control over interactions in a TreeNet model     o o
Interaction strength reporting     o o
Build a CART tree utilizing the TreeNet engine to gain speedas well as alternative reporting       o
Build a RandomForests model utilizing the TreeNet engine to gain speed as well as alternative reporting       o
RandomForests inspired sampling of predictors at each node during model building       o
Automation:Explore the impact of influence trimming (outlier removal) for logistic and classification models (Battery INFLUENCE)       o
Automation:Exhaustive search and ranking for all interactions of the specified order (Battery ICL)       o

 additional treenet features

[K#558:1305]

Requirements

 

Windows - Minimum System Requirements

We suggest the following minimum and recommended, system requirements:

  • 80486 processor or higher.
  • 512MB of random-access memory (RAM). This value depends on the "size" you have purchased (64MB, 128MB, 256MB, 512MB, 1GIG). While all versions may run with a minimum of 32MB of RAM, we CANNOT GUARANTEE they will. We highly recommend that you follow the recommended memory configuration that applies to the particular version you have purchased. Using less than the recommended memory configuration results in hard drive paging, reducing performance significantly, or application instability.
  • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
  • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).
  • CD-ROM or DVD drive.

Recommended System Requirements

Because Salford Tools are extremely CPU intensive, the faster your CPU, the faster they will run. For optimal performance, we strongly recommend they run on a machine with a system configuration equal to, or greater than, the following:

  • Pentium 4 processor running 2.0+ GHz.
  • 2 GIG of random-access memory (RAM). This value depends on the "size" you have purchased (64MB, 128MB, 256MB, 512MB, 1GIG). While all versions may run with a minimum of 32MB of RAM, we CANNOT GUARANTEE they will. We highly recommend that you follow the recommended memory configuration that applies to the particular version you have purchased. Using less than the recommended memory configuration results in hard drive paging, reducing performance significantly, or application instability.
  • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
  • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).
  • CD-ROM or DVD drive.
  • 2 GIG of additional hard disk space available for virtual memory and temporary files.

Ensuring Proper Permissions

If you are installing on a machine that uses security permissions, please read the following note.

  • You must belong to the Administrator group on Windows 2003 / 2008, Windows 7 / 8 to be able to properly install and license. Once the application is installed and licensed, any member with read/write/modify permissions to the applications /bin and temp directories can execute and run the application.

UNIX/Linux - Minimum System Requirements

Supported Architectures

  • Alpha: DEC 3000 or AlphaServer running Tru64 UNIX 5.0 or higher
  • Linux/i386: i586 or higher processor; Linux 2.4 or higher kernel; glibc 2.3 or higher
  • Linux/AMD64: AMD64 or Intel EM64T processor; Linux 2.6 or higher kernel; glibc 2.3 or higher
  • Sun: UltraSPARC processor; Solaris 2.6 or higher
  • RS/6000: POWER or PowerPC processor; AIX 4.2 or higher
  • HP 9000: PA/RISC 1.1 or higher processor; HP/UX 11.x
  • SGI: MIPS 4 or higher processor; IRIX 6.5

Minimum System Requirements

  • Minimum RAM requirement for all non-GUI app's is 32 MB of random-access memory (RAM). This value depends on the "size"
    you have purchased (64MB, 128MB, 256MB, 512MB, 1GIG).
  • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
  • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).

Recommended System Requirements

  • Recommended random-access memory (RAM) is 1.5 times the licensed data limit (32 MB, 64 MB, etc), up to the maximum permitted by the target architecture. On UNIX systems, it is generally recommended that there be at least twice as much swap space as there is RAM.
  • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
  • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).

All Salford apps are very CPU intensive, so more memory and a faster CPU are always helpful.

Licensing Application

TreeNet uses a system of application system ID and associated unlock key. When installation is complete, the user will need to email the application "system ID." This system ID is clearly displayed in the License Information displayed the first time the application is started. You can alternatively get to this window by selecting the Help->License menu option.

Method 1: Fixed License
With a fixed license, each machine must have its own copy of the licensed program installed. If your license terms permit more than one copy, then the license must be activated on each machine that will be used.

Method 2: Floating License
This method of licensing your program is used if you intend the program application to be used by more than one user concurrently over a network. A floating license tracks the number of copies "checked out." When that number exceeds your license terms, a message is provided informing the user "all copies are checked out." The licensed program may be installed on a machine that each client machine can access. Machines that are not connected to the network must be issued a fixed license (Method 1 above).

A floating license is particularly useful when the number of potential users exceeds the number of seats specified in your license terms.

[K#740:1407]

 

Price

[K#563:1305]

Download

The SPM Salford Predictive Modeler® software suite is a highly accurate and ultra-fast platform for creating predictive, descriptive, and analytical models from databases of any size, complexity, or organization. The SPM® software suite has automation that accelerates the process of model building by conducting substantial portions of the model exploration and refinement process for the analyst. While the analyst is always in full control, we optionally anticipate the analyst's next best steps and package a complete set of results from alternative modeling strategies for easy review. Do in one day what normally requires a week or more using other systems.

The Salford Predictive Modeler® software suite includes:

CART
The definitive classification tree developed by world renowned statisticians including Drs Jerome Friedman and Leo Breiman. CART is one of most well known data mining algorithms considered to be algorithm responsible for bringing out university into business
MARS:
Ideal for users who prefer results in a form similar to traditional regression while capturing essential non–linearities and interactions.
TreeNet:
TreeNet is salford's most flexible and powerful data mining tool capable of consistently generating extremely accurate models has been responsible for the majority modeling competition awards demonstrates remarkable performance both regression classification algorithm typically generates thousands small decision trees built in a sequential error correcting process to converge an model
RandomForests:
RF features include prediction, clusters and segment discoveries, anomaly tagging detection and multivariate class description. The method was developed by Leo Breiman and Adele Cutler of University of California, Berkeley.


New Components & Features available in version 7.0!

GPS:
Generalized Path Seeker is Jerry Friedman's approach to regularized regression this technology offers high speed lasso for extreme data set configurations with upwards of 100,000 predictors and possibly very few rows such sets are commonplace in gene research text mining. The new both supremely fast efficient
RuleLearner:
RuleLearner is a powerful post–processing technique which selects the most influential subset of nodes, thus reducing model complexity. RuleLearner allows the modeler to take advantage of the increased accuracy of very complicated TreeNet and RandomForests models while still yielding the simplicity of CART models.
[K#601:1306]

University Program

Salford Systems' University Program provides TreeNet at significantly reduced licensing fees to the educational community. Eligible educational institutions are colleges, universities, community colleges, technical schools, and science centers. Additionally, a 90-day free evaluation is available upon request.

The University Program gives eligible educational institutions the right to distribute TreeNet and other Salford tools right-to-use licenses to all faculty, staff, and students for personal computers, and to install UNIX versions of these tools on University workstations and servers. For more information on this special program, please contact our sales department. Salford Systems is committed to supporting education and research in universities worldwide and offers special packaging and pricing.

We also offer academics cost-free access to our tutorial materials for classroom use.

 

[K#562:1305]

Product Versions

SPM 7 Product Versions

Ultra
The best of the best. For the modeler who must have access to leading edge technology available and fastest run times including major advances in ensemble modeling, interaction detection and automation. ULTRA also provides advance access to new features as they become available in frequent upgrades.
ProEx
For the modeler who needs cutting-edge data mining technology, including extensive automation of workflows typical for experienced data analysts and dozens of extensions to the Salford data mining engines.
Pro
A true predictive modeling workbench designed for the professional data miner. Variety of supporting conventional statistical modeling tools, programming language, reporting services, and a modest selection of workflow automation options.
Basic
Literally the basics. Salford Systems award winning data mining engines without extensions or automation or surrounding statistical services, programming language, and sophisticated reporting. Designed for small budgets while still delivering our world famous engines

[K#523:1308]

Videos

Click on title to open slide

Introduction to Treenet

Introduction to TreeNet
by: Mikhail Golovnya, Salford-Systems

Introduction to TreeNet

 
...
 
 

[k#633:20140514]

Training in TreeNet

Training In TreeNet Part 1

 
...
 

Training In TreeNet Part 2

 
...
 

Training In TreeNet Part 3

 
...
 

Training In TreeNet Part 4

 
...
 

Training In TreeNet Part 5

 
...
 
 

[k#640:20140514]

 

Scalability

A user's license sets a limit on the amount of learn sample data that can be analyzed. The learn sample is the data used to build the model. Note that there is no limit to the number of test sample data points that may be analyzed. In other words, rows -by- columns of variables and observations used to build the model. Variable not used in the model do not count. Observations reserved for testing, or excluded for other reasons, do not count.

For example, suppose our 32MB version that sets a learn sample limitation of 8 MB. Each data point occupies 4 bytes. For instance, a 8MB capacity license will allow up to 8 * 1024 * 1024 / 4 = 2,097,152 learn sample data points to be analyzed.A data point is a represented by a 1-variable by- 1-observation (1-row by-1-column).

The following is a table that describes the current set of "sizes" available. Please note that the minimum required RAM is **not** the same as the learn sample limitation.

Size Data Limit MB Data Limit # of values  
minimum required
physical memory
(RAM) in MB
Licensed learn sample
data sizein MB 
(1 MB = 1,048,576 bytes)
Licensed # of learn
sample values
(rows by columns)
 
32 8 2,097,152  
64 18 4,718,592  
128 45 11,796,480  
256 100 26,214,400  
512 200 52,428,800  
1024 400 104,857,600  
2048 800 209,715,200 **64-bit only
3072 1200 324,572,800 **64-bit only

Additional larger capacity is available under 64-bit operating systems, using our non-GUI (command-line) builds. The non-GUI is very flexible and can be licensed for large data limits not currently available in the GUI product line. The current MAXIMUM is 8-GIG data capacity for our non-GUI builds.

[K#561:1305]

Testimonials

Broadband propensity project: comparing TreeNet with Enterprise Miner using logistic regression.

We’re seeing these benefits
1. TreeNet (Stochastic gradient boosting) method injects randomization to the selection of candidate predictors and training data, making this method much more robust than the traditional statistical models especially in dealing with messy data. For example, in our dataset, there is a part of information missing like customer’s portfolio and usage data. Although we do the data replacement for the statistical models, it would still affect the final results as it uses the whole dataset for training. By using TreeNet, only a subset of data and predictors are used each time and this process will be repeated for hundreds of time. This method greatly reduced the influence of messy data and improves the robustness of the final model. In terms of the nature of modeling, growing a large number of small trees instead of using a single complex tree has been proved to be more accurate and robust.
Our modeling dataset is always of big size. In this example, data size is above 500,000 and the initial predictor set is about 160 predictors. TreeNet is computational efficient and scalable for the large dataset which is much faster than the enterprise miner.
In the result analysis, the detailed relationships between predictors and the target are much easier to be visualized. Battery automates the process of running multiple experiments which reduces a lot of efforts in the predictor selection. In this example, 16 predictors are finally selected after 5 cycles and 2 battery processes.
2. Insights: TreeNet can dig out very granular information. For example, it helps to find the impact of a specific sector of predictors. In this example, predictors regarding to product holding and usage are emphasized in the TreeNet model while they are not prominent in the traditional logistic regression model. Information in these sectors contributed a lot in improving the prediction of customer’s propensity in buying our fix broadband products.
3. We’re seeing various levels of performance gains over traditional statistical models. For the best performance we’ve seen, the improvement of Lift is consistently about 40%, helping to capture more than 30% customers who are willing to buy our product.

Predictive Analytics Manager at Leading Telco in Singapore.
**Her team works on scientific marketing initiatives using statistics, data science and optimization methodologies.


David Vogel, CEO Voloridge Investment Management and Captain of the winning Heritage Health prize team

I have multiple versions of gradient boosting I could find including popular open source versions and TreeNet outperforms them all in predictive accuracy (consistently across many different kinds of data sets) while maintaining the ability to train models quickly.

David Vogel, CEO Voloridge Investment Management and Capatain of the winning Heritage Health prize team
Florida, USA


 

Brad Turner, Vice President of Marketing and Business Development, Inkiru

Everyday, the Inkiru product predicts sales for 2000 items in an e-commerce context. In addition, the product generates a customized confidence interval for each prediction. The input is dynamic and it consists of 1 year of historical data. Each record contains approximately 150 features with information about sales, products, customers, and promotions.

The problem was very challenging from a modeling point of view. Important parts of the data were continuous, categorical, highly non-linear, sparse, missing, and noisy. We found Salford Systems adequate to deal with these characteristics of the data.

Precision was an important goal in this project. A validation with real data reports 90% of the predictions lying within 7 units of the actual sales and 50% within 2 units. Salford Systems was definitely an important tool to reach this degree of accuracy in the product.

 Brad Turner, Vice President of Marketing and Business Development, Inkiru
California, USA


Andrew Russo, Vice President, Modeling and Analytics at AccuData Integrated

As a traditional modeler, I had been primarily using regression and logistic regressions. I began to test TreeNet last fall. Since then I have built several models that are now market-tested and are performing as predicted by TreeNet. The real value of TreeNet has been the speed in which it builds data, the accuracy of its predictions and the incremental lift it is experiencing in side-by-side tests of regressions. It has also proven to be a tremendous data prep time saver in its ability to deal with outliers, missing data as well as doing a decent job distinguishing between scale and categorical data. Importantly, the ability for less-hands-on model builds has enabled us to offer new modeling products to our clients that otherwise would not have had the budget to do a modeling project. In short this new, advanced capability is giving my company a competitive advantage.

 Andrew Russo, Vice President, Modeling and Analytics at AccuData Integrated Marketing
Florida, USA


Tom Osborn, Adjunct Professor at University of Technology

I've used TreeNet on commercial projects since '04. For customer and prospect targeting, it outperforms logistic family regression, neural nets and other methods in my kitbag. Key strengths: handling of missing values, robustness, general non-linearity, variable interactions. Clients like feedback on variable importance (more general than Shapley or PMVD). They also like seeing how the variable contribute to predictions. Fast and easy to use. Best - is developed on Jerry Friedman's great maths.

 Tom Osborn, Adjunct Professor (analytics/data mining) at University of Technology, Sydney
Sydney, Australia


[k#646:1407]

download-now  ondemand-video

[K#573:1305]

FacebookTwitterLinkedin