On Demand Introductory Videos
Download Now Instant Evaluation
Get Price Quote

TreeNet®

  • TreeNet Introduction

    Predictive Power:
    TreeNet is Salford's most flexible and powerful data mining tool, capable of consistently generating extremely accurate models. TreeNet’s level of accuracy is usually not attainable by single models or by ensembles such as bagging or conventional boosting. TreeNet demonstrates remarkable performance for both regression and classification. The algorithm typically generates thousands of small decision trees built in a sequential error–correcting process to converge to an accurate model. Tree Net has been responsible for the majority of Salford’s modeling competition awards.
    Supreme Accuracy:
    TreeNet's robustness extends to data contaminated with erroneous target labels. This type of data error can be very challenging for conventional data mining methods and will be catastrophic for conventional boosting. In contrast, TreeNet is generally immune to such errors as it dynamically rejects training data points too much at variance with the existing model. In addition, TreeNet adds the advantage of a degree of accuracy usually not attainable by a single model or by ensembles such as bagging or conventional boosting. As opposed to neural networks, TreeNet is not sensitive to data errors and needs no time-consuming data preparation, preprocessing or imputation of missing values.
    Advanced Features:
    Interaction detection establishes whether interactions of any kind are needed in a predictive model, and is a search engine to discover specifically which interactions are required. The interaction detection system not only helps improve model performance (sometimes dramatically) but also assists in the discovery of valuable new segments and previously unrecognized patterns.

    Technical Articles by Jerome Friedman are also available for download:

     

     

    [K#557:1504]

  • Additional TreeNet Features are available in Pro, ProEx, and Ultra.

    ComponentsBasicProProExUltra
    ComponentsBasicProProExUltra
    Modeling Engine: TreeNet (Stochastic Gradient Boosting) o o o o
    Spline-based approximations to the TreeNet dependency plots   o o o
    Exporting TreeNet dependency plots into XML file   o o o
    Automation:Build a series of models changing the minimum required size on child nodes (Battery MINCHILD)   o o o
    Flexible control over interactions in a TreeNet model     o o
    Interaction strength reporting     o o
    Build a CART tree utilizing the TreeNet engine to gain speedas well as alternative reporting       o
    Build a RandomForests model utilizing the TreeNet engine to gain speed as well as alternative reporting       o
    RandomForests inspired sampling of predictors at each node during model building       o
    Automation:Explore the impact of influence trimming (outlier removal) for logistic and classification models (Battery INFLUENCE)       o
    Automation:Exhaustive search and ranking for all interactions of the specified order (Battery ICL)       o

     additional treenet features

    [K#558:1504]

    • We suggest the following minimum and recommended, system requirements:

      • 80486 processor or higher.
      • 512MB of random-access memory (RAM). This value depends on the "size" you have purchased (64MB, 128MB, 256MB, 512MB, 1GIG). While all versions may run with a minimum of 32MB of RAM, we CANNOT GUARANTEE they will. We highly recommend that you follow the recommended memory configuration that applies to the particular version you have purchased. Using less than the recommended memory configuration results in hard drive paging, reducing performance significantly, or application instability.
      • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
      • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).
      • CD-ROM or DVD drive.

      Recommended System Requirements

      Because Salford Tools are extremely CPU intensive, the faster your CPU, the faster they will run. For optimal performance, we strongly recommend they run on a machine with a system configuration equal to, or greater than, the following:

      • Pentium 4 processor running 2.0+ GHz.
      • 2 GIG of random-access memory (RAM). This value depends on the "size" you have purchased (64MB, 128MB, 256MB, 512MB, 1GIG). While all versions may run with a minimum of 32MB of RAM, we CANNOT GUARANTEE they will. We highly recommend that you follow the recommended memory configuration that applies to the particular version you have purchased. Using less than the recommended memory configuration results in hard drive paging, reducing performance significantly, or application instability.
      • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
      • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).
      • CD-ROM or DVD drive.
      • 2 GIG of additional hard disk space available for virtual memory and temporary files.

      Ensuring Proper Permissions

      If you are installing on a machine that uses security permissions, please read the following note.

      • You must belong to the Administrator group on Windows 2003 / 2008, Windows 7 / 8 to be able to properly install and license. Once the application is installed and licensed, any member with read/write/modify permissions to the applications /bin and temp directories can execute and run the application.
    • Supported Architectures

      • Alpha: DEC 3000 or AlphaServer running Tru64 UNIX 5.0 or higher
      • Linux/i386: i586 or higher processor; Linux 2.4 or higher kernel; glibc 2.3 or higher
      • Linux/AMD64: AMD64 or Intel EM64T processor; Linux 2.6 or higher kernel; glibc 2.3 or higher
      • Sun: UltraSPARC processor; Solaris 2.6 or higher
      • RS/6000: POWER or PowerPC processor; AIX 4.2 or higher
      • HP 9000: PA/RISC 1.1 or higher processor; HP/UX 11.x
      • SGI: MIPS 4 or higher processor; IRIX 6.5

      Minimum System Requirements

      • Minimum RAM requirement for all non-GUI app's is 32 MB of random-access memory (RAM). This value depends on the "size"
        you have purchased (64MB, 128MB, 256MB, 512MB, 1GIG).
      • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
      • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).

      Recommended System Requirements

      • Recommended random-access memory (RAM) is 1.5 times the licensed data limit (32 MB, 64 MB, etc), up to the maximum permitted by the target architecture. On UNIX systems, it is generally recommended that there be at least twice as much swap space as there is RAM.
      • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
      • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).

      All Salford apps are very CPU intensive, so more memory and a faster CPU are always helpful.

    Licensing Application

    TreeNet uses a system of application system ID and associated unlock key. When installation is complete, the user will need to email the application "system ID." This system ID is clearly displayed in the License Information displayed the first time the application is started. You can alternatively get to this window by selecting the Help->License menu option.

    Method 1: Fixed License
    With a fixed license, each machine must have its own copy of the licensed program installed. If your license terms permit more than one copy, then the license must be activated on each machine that will be used.

    Method 2: Floating License
    This method of licensing your program is used if you intend the program application to be used by more than one user concurrently over a network. A floating license tracks the number of copies "checked out." When that number exceeds your license terms, a message is provided informing the user "all copies are checked out." The licensed program may be installed on a machine that each client machine can access. Machines that are not connected to the network must be issued a fixed license (Method 1 above).

    A floating license is particularly useful when the number of potential users exceeds the number of seats specified in your license terms.

    [K#740:1504]

  •  

    [K#563:1504]

  • Salford Systems' University Program provides TreeNet at significantly reduced licensing fees to the educational community. Eligible educational institutions are colleges, universities, community colleges, technical schools, and science centers. Additionally, a 90-day free evaluation is available upon request.

     

    The University Program gives eligible educational institutions the right to distribute TreeNet and other Salford tools right-to-use licenses to all faculty, staff, and students for personal computers, and to install UNIX versions of these tools on University workstations and servers. For more information on this special program, please contact our sales department. Salford Systems is committed to supporting education and research in universities worldwide and offers special packaging and pricing.

    We also offer academics cost-free access to our tutorial materials for classroom use.

     

    [K#562:1504]

  • SPM 7 Product Versions

    Ultra
    The best of the best. For the modeler who must have access to leading edge technology available and fastest run times including major advances in ensemble modeling, interaction detection and automation. ULTRA also provides advance access to new features as they become available in frequent upgrades.
    ProEx
    For the modeler who needs cutting-edge data mining technology, including extensive automation of workflows typical for experienced data analysts and dozens of extensions to the Salford data mining engines.
    Pro
    A true predictive modeling workbench designed for the professional data miner. Variety of supporting conventional statistical modeling tools, programming language, reporting services, and a modest selection of workflow automation options.
    Basic
    Literally the basics. Salford Systems award winning data mining engines without extensions or automation or surrounding statistical services, programming language, and sophisticated reporting. Designed for small budgets while still delivering our world famous engines

    [K#523:1504]

  • Click on title to open slide
    • Introduction to TreeNet
      by: Mikhail Golovnya, Salford-Systems

      Introduction to TreeNet

       
      ...
       
    • Five Part Videos

      Training In TreeNet Part 1

       
      ...
       

      Training In TreeNet Part 2

       
      ...
       

      Training In TreeNet Part 3

       
      ...
       

      Training In TreeNet Part 4

       
      ...
       

      Training In TreeNet Part 5

       
      ...
       
     

    [k2#559:1504]

  • A user's license sets a limit on the amount of learn sample data that can be analyzed. The learn sample is the data used to build the model. Note that there is no limit to the number of test sample data points that may be analyzed. In other words, rows -by- columns of variables and observations used to build the model. Variable not used in the model do not count. Observations reserved for testing, or excluded for other reasons, do not count.

    For example, suppose our 32MB version that sets a learn sample limitation of 8 MB. Each data point occupies 4 bytes. For instance, a 8MB capacity license will allow up to 8 * 1024 * 1024 / 4 = 2,097,152 learn sample data points to be analyzed.A data point is a represented by a 1-variable by- 1-observation (1-row by-1-column).

    The following is a table that describes the current set of "sizes" available. Please note that the minimum required RAM is **not** the same as the learn sample limitation.

    Size Data Limit MB Data Limit # of values  
    minimum required
    physical memory
    (RAM) in MB
    Licensed learn sample
    data sizein MB 
    (1 MB = 1,048,576 bytes)
    Licensed # of learn
    sample values
    (rows by columns)
     
    32 8 2,097,152  
    64 18 4,718,592  
    128 45 11,796,480  
    256 100 26,214,400  
    512 200 52,428,800  
    1024 400 104,857,600  
    2048 800 209,715,200 **64-bit only
    3072 1200 324,572,800 **64-bit only

    Additional larger capacity is available under 64-bit operating systems, using our non-GUI (command-line) builds. The non-GUI is very flexible and can be licensed for large data limits not currently available in the GUI product line. The current MAXIMUM is 8-GIG data capacity for our non-GUI builds.

    [K#561:1504]

  • Broadband propensity project: comparing TreeNet with Enterprise Miner using logistic regression.

    We’re seeing these benefits
    1. TreeNet (Stochastic gradient boosting) method injects randomization to the selection of candidate predictors and training data, making this method much more robust than the traditional statistical models especially in dealing with messy data. For example, in our dataset, there is a part of information missing like customer’s portfolio and usage data. Although we do the data replacement for the statistical models, it would still affect the final results as it uses the whole dataset for training. By using TreeNet, only a subset of data and predictors are used each time and this process will be repeated for hundreds of time. This method greatly reduced the influence of messy data and improves the robustness of the final model. In terms of the nature of modeling, growing a large number of small trees instead of using a single complex tree has been proved to be more accurate and robust.
    Our modeling dataset is always of big size. In this example, data size is above 500,000 and the initial predictor set is about 160 predictors. TreeNet is computational efficient and scalable for the large dataset which is much faster than the enterprise miner. In the result analysis, the detailed relationships between predictors and the target are much easier to be visualized. Battery automates the process of running multiple experiments which reduces a lot of efforts in the predictor selection. In this example, 16 predictors are finally selected after 5 cycles and 2 battery processes.
    2. Insights: TreeNet can dig out very granular information. For example, it helps to find the impact of a specific sector of predictors. In this example, predictors regarding to product holding and usage are emphasized in the TreeNet model while they are not prominent in the traditional logistic regression model. Information in these sectors contributed a lot in improving the prediction of customer’s propensity in buying our fix broadband products.
    3. We’re seeing various levels of performance gains over traditional statistical models. For the best performance we’ve seen, the improvement of Lift is consistently about 40%, helping to capture more than 30% customers who are willing to buy our product.

    Predictive Analytics Manager at Leading Telco in Singapore.
    **Her team works on scientific marketing initiatives using statistics, data science and optimization methodologies.


    David Vogel, CEO Voloridge Investment Management and Captain of the winning Heritage Health prize team

    I have multiple versions of gradient boosting I could find including popular open source versions and TreeNet outperforms them all in predictive accuracy (consistently across many different kinds of data sets) while maintaining the ability to train models quickly.

    David Vogel, CEO Voloridge Investment Management and Captain of the winning Heritage Health prize team
    Florida, USA


     

    Brad Turner, Vice President of Marketing and Business Development, Inkiru

    Everyday, the Inkiru product predicts sales for 2000 items in an e-commerce context. In addition, the product generates a customized confidence interval for each prediction. The input is dynamic and it consists of 1 year of historical data. Each record contains approximately 150 features with information about sales, products, customers, and promotions.

    The problem was very challenging from a modeling point of view. Important parts of the data were continuous, categorical, highly non-linear, sparse, missing, and noisy. We found Salford Systems adequate to deal with these characteristics of the data.

    Precision was an important goal in this project. A validation with real data reports 90% of the predictions lying within 7 units of the actual sales and 50% within 2 units. Salford Systems was definitely an important tool to reach this degree of accuracy in the product.

     Brad Turner, Vice President of Marketing and Business Development, Inkiru
    California, USA


    Andrew Russo, Vice President, Modeling and Analytics at AccuData Integrated

    As a traditional modeler, I had been primarily using regression and logistic regressions. I began to test TreeNet last fall. Since then I have built several models that are now market-tested and are performing as predicted by TreeNet. The real value of TreeNet has been the speed in which it builds data, the accuracy of its predictions and the incremental lift it is experiencing in side-by-side tests of regressions. It has also proven to be a tremendous data prep time saver in its ability to deal with outliers, missing data as well as doing a decent job distinguishing between scale and categorical data. Importantly, the ability for less-hands-on model builds has enabled us to offer new modeling products to our clients that otherwise would not have had the budget to do a modeling project. In short this new, advanced capability is giving my company a competitive advantage.

     Andrew Russo, Vice President, Modeling and Analytics at AccuData Integrated Marketing
    Florida, USA


    Tom Osborn, Adjunct Professor at University of Technology

    I've used TreeNet on commercial projects since '04. For customer and prospect targeting, it outperforms logistic family regression, neural nets and other methods in my kitbag. Key strengths: handling of missing values, robustness, general non-linearity, variable interactions. Clients like feedback on variable importance (more general than Shapley or PMVD). They also like seeing how the variable contribute to predictions. Fast and easy to use. Best - is developed on Jerry Friedman's great maths.

     Tom Osborn, Adjunct Professor (analytics/data mining) at University of Technology, Sydney
    Sydney, Australia


    Fred Hazelton, Master Statistician

    Predicting Crowds at Walt Disney World Theme Parks
    Since 1986, the Unofficial Guide to Walt Disney World has been helping visitors to Orlando’s theme parks get the most out of their time and money. Market research shows that the two most important factors that affect a visitors satisfaction with a Disney trip is; 1) how long did I have to wait in line and 2) how much did I get to see. The Unofficial Guide and its website, TouringPlans.com has become the best source for solving these two problems.
    The most effective way to reduce the amount of time you wait in line and to increase the number of attractions you get to experience is to visit at a time of year when the crowds are lower and to use an optimal, computer designed touring plan. Touring Plans are great! They tell you the optimal order in which to experience the attractions with minimal wait, a classic implementation of the travelling-salesman problem. However, an optimal touring plan requires that we can predict with reasonable accuracy, the wait time at an attraction at any given time or day, for any given day of the year.
    We at TouringPlans.com have been using traditional linear regression methods to predict wait times for several years. But, the limitations of regression are more apparent as we gather more and more data. Subscribers to our mobile application “Lines” can see our estimates for wait times and submit updates when they are in the park. The sporadic nature of the wait times that we gather make it difficult to utilize in a traditional regression environment. The ups and downs of wait times throughout the day are difficult to model using regression but perfect for a data mining tool.

    Using Treenet
    Some auxiliary variables such as Park Hours, Parade Schedules, Historic Wait Times and School Schedules are available for each wait time record in advance. These can be used in a traditional regression model to analyse the past and predict the future. But the true value of the data we gather is in its dynamic nature. Variables like current weather, attraction status (open or broken down), recent wait times and recent wait times for other attractions have a great impact on how long you will wait in line. These variables are not available for predictions in advance and the value of these variables is not available for all records in the database. For example, not every wait time record in the database will have a recent wait time submission. Treenet can easily handle missing data, whereas regression cannot.
    In a traditional regression model, the burden of determining variable interactions is placed on the statistician, usually to be discovered using trial and error. It is easy to rationalize that wait time data must have plenty of interactions that have a great impact. Relationships between wait times at other attractions, relationships between park hours and parade schedules, etc. With dozens of variables, the process of identifying interactions (and transformations) is prohibitive in a traditional regression environment. In Treenet, the search for interactions and transformations is inherent, exhaustive and automatic – a refreshing saver of time and energy, allowing more resources for other tasks.

    Fred Hazelton
    Master Statistician


    Constance Jiang, Data Analyst, Tencent, Inc.

    As a Data Analyst in risk management fields, it is significant to distinguish quality consumers, so as to recognize and limit low ROI transactions. We use TreeNet to build classification models, work on regressive problems. This software not only provides us great choices of powerful algorithms for model training, but also shows its outstanding accuracy (10% better under same circumstances), ability to process huge datasets, like, over 100,000 records with 50 complicated variables. TreeNet is also highly productive and user-friendly, several minutes are quite enough for model training. Now we can now spend more time on the results analysis and decision making.
    TreeNet's performance is impressive, satisfying, and could really adapted into real scenarios and reducing the related risks.

     Constance JiangData Analyst Tencent, Inc


    Xu Jie from Nanjing University of Information Science & Technology

    I am conducting a project about GIS, in which many data analysis are needed. Lacking useful tools, our project made slow progress. In an accidental chance I got a TreeNet trial version and it shocked me with its powerful capabilities of data analyzing, friendly user interface and most important of all, accuracy. After using it we got many benefits from it during our research and our project had gone much faster.
    In many features of TreeNet, we like most is plots which offers graphs displaying after building model. This feature is especially useful to us which provides the most visual and easy way to find shortcomings and make improvement of the model. We like the multiple model setting up ways as well, it’s flexible and covers most aspects of our research.
    What attracts me most is the amazing speed of TreeNet. We have used some other software before using TreeNet. None of them could build a model in such a short time. What’s more, using TreeNet, the painstaking procedures of data preprocessing are saved, it greatly accelerated our research. Since our data contains over 100 million of variables, using common software, it takes weeks to get a result of analyze. However, by using TreeNet, it takes just a couple of days.
    In addition, the most important is the accuracy. Due to the defects of sampling stage, there are some noisy variables in our data, which brings instability to our model. By using TreeNet, we got more stable model of our research than by other tools. What’s more, the model we built by TreeNet could be repeated and verified.
    As to ROI, I cannot say how much money we have saved by using this tool, but we did consider to buy a high performance computer(about 5000 US dollars)to assist our research. After using the software, we decided to postpone that purchasing.
    We are getting to use TreeNet just a few months , but we really impressed by its powerfulness. We know we are using just a few common features, a lot of powerful features are still waiting for us to learn.

     Xu Jie
    Nanjing University of Information Science & Technology


    [k#646:1512]

[K#573:1504]