On Demand Introductory Videos
Download Now Instant Evaluation
Get Price Quote

TreeNet®

  • TreeNet Introduction

    Predictive Power:
    TreeNet is Salford's most flexible and powerful data mining tool, capable of consistently generating extremely accurate models. TreeNet’s level of accuracy is usually not attainable by single models or by ensembles such as bagging or conventional boosting. TreeNet demonstrates remarkable performance for both regression and classification. The algorithm typically generates thousands of small decision trees built in a sequential error–correcting process to converge to an accurate model. Tree Net has been responsible for the majority of Salford’s modeling competition awards.
    Supreme Accuracy:
    TreeNet's robustness extends to data contaminated with erroneous target labels. This type of data error can be very challenging for conventional data mining methods and will be catastrophic for conventional boosting. In contrast, TreeNet is generally immune to such errors as it dynamically rejects training data points too much at variance with the existing model. In addition, TreeNet adds the advantage of a degree of accuracy usually not attainable by a single model or by ensembles such as bagging or conventional boosting. As opposed to neural networks, TreeNet is not sensitive to data errors and needs no time-consuming data preparation, preprocessing or imputation of missing values.
    Advanced Features:
    Interaction detection establishes whether interactions of any kind are needed in a predictive model, and is a search engine to discover specifically which interactions are required. The interaction detection system not only helps improve model performance (sometimes dramatically) but also assists in the discovery of valuable new segments and previously unrecognized patterns.

    Technical Articles by Jerome Friedman are also available for download:

     

     

    [K#557:1504]

  • Additional TreeNet Features are available in Pro, ProEx, and Ultra.

    ComponentsBasicProProExUltra
    ComponentsBasicProProExUltra
    Modeling Engine: TreeNet (Stochastic Gradient Boosting) o o o o
    Spline-based approximations to the TreeNet dependency plots   o o o
    Exporting TreeNet dependency plots into XML file   o o o
    Automation:Build a series of models changing the minimum required size on child nodes (Battery MINCHILD)   o o o
    Flexible control over interactions in a TreeNet model     o o
    Interaction strength reporting     o o
    Build a CART tree utilizing the TreeNet engine to gain speedas well as alternative reporting       o
    Build a RandomForests model utilizing the TreeNet engine to gain speed as well as alternative reporting       o
    RandomForests inspired sampling of predictors at each node during model building       o
    Automation:Explore the impact of influence trimming (outlier removal) for logistic and classification models (Battery INFLUENCE)       o
    Automation:Exhaustive search and ranking for all interactions of the specified order (Battery ICL)       o

     additional treenet features

    [K#558:1504]

    • We suggest the following minimum and recommended, system requirements:

      • 80486 processor or higher.
      • 512MB of random-access memory (RAM). This value depends on the "size" you have purchased (64MB, 128MB, 256MB, 512MB, 1GIG). While all versions may run with a minimum of 32MB of RAM, we CANNOT GUARANTEE they will. We highly recommend that you follow the recommended memory configuration that applies to the particular version you have purchased. Using less than the recommended memory configuration results in hard drive paging, reducing performance significantly, or application instability.
      • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
      • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).
      • CD-ROM or DVD drive.

      Recommended System Requirements

      Because Salford Tools are extremely CPU intensive, the faster your CPU, the faster they will run. For optimal performance, we strongly recommend they run on a machine with a system configuration equal to, or greater than, the following:

      • Pentium 4 processor running 2.0+ GHz.
      • 2 GIG of random-access memory (RAM). This value depends on the "size" you have purchased (64MB, 128MB, 256MB, 512MB, 1GIG). While all versions may run with a minimum of 32MB of RAM, we CANNOT GUARANTEE they will. We highly recommend that you follow the recommended memory configuration that applies to the particular version you have purchased. Using less than the recommended memory configuration results in hard drive paging, reducing performance significantly, or application instability.
      • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
      • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).
      • CD-ROM or DVD drive.
      • 2 GIG of additional hard disk space available for virtual memory and temporary files.

      Ensuring Proper Permissions

      If you are installing on a machine that uses security permissions, please read the following note.

      • You must belong to the Administrator group on Windows 2003 / 2008, Windows 7 / 8 to be able to properly install and license. Once the application is installed and licensed, any member with read/write/modify permissions to the applications /bin and temp directories can execute and run the application.
    • Supported Architectures

      • Alpha: DEC 3000 or AlphaServer running Tru64 UNIX 5.0 or higher
      • Linux/i386: i586 or higher processor; Linux 2.4 or higher kernel; glibc 2.3 or higher
      • Linux/AMD64: AMD64 or Intel EM64T processor; Linux 2.6 or higher kernel; glibc 2.3 or higher
      • Sun: UltraSPARC processor; Solaris 2.6 or higher
      • RS/6000: POWER or PowerPC processor; AIX 4.2 or higher
      • HP 9000: PA/RISC 1.1 or higher processor; HP/UX 11.x
      • SGI: MIPS 4 or higher processor; IRIX 6.5

      Minimum System Requirements

      • Minimum RAM requirement for all non-GUI app's is 32 MB of random-access memory (RAM). This value depends on the "size"
        you have purchased (64MB, 128MB, 256MB, 512MB, 1GIG).
      • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
      • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).

      Recommended System Requirements

      • Recommended random-access memory (RAM) is 1.5 times the licensed data limit (32 MB, 64 MB, etc), up to the maximum permitted by the target architecture. On UNIX systems, it is generally recommended that there be at least twice as much swap space as there is RAM.
      • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
      • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).

      All Salford apps are very CPU intensive, so more memory and a faster CPU are always helpful.

    Licensing Application

    TreeNet uses a system of application system ID and associated unlock key. When installation is complete, the user will need to email the application "system ID." This system ID is clearly displayed in the License Information displayed the first time the application is started. You can alternatively get to this window by selecting the Help->License menu option.

    Method 1: Fixed License
    With a fixed license, each machine must have its own copy of the licensed program installed. If your license terms permit more than one copy, then the license must be activated on each machine that will be used.

    Method 2: Floating License
    This method of licensing your program is used if you intend the program application to be used by more than one user concurrently over a network. A floating license tracks the number of copies "checked out." When that number exceeds your license terms, a message is provided informing the user "all copies are checked out." The licensed program may be installed on a machine that each client machine can access. Machines that are not connected to the network must be issued a fixed license (Method 1 above).

    A floating license is particularly useful when the number of potential users exceeds the number of seats specified in your license terms.

    [K#740:1504]

  •  

    [K#563:1504]

  • Salford Systems' University Program provides TreeNet at significantly reduced licensing fees to the educational community. Eligible educational institutions are colleges, universities, community colleges, technical schools, and science centers. Additionally, a 90-day free evaluation is available upon request.

     

    The University Program gives eligible educational institutions the right to distribute TreeNet and other Salford tools right-to-use licenses to all faculty, staff, and students for personal computers, and to install UNIX versions of these tools on University workstations and servers. For more information on this special program, please contact our sales department. Salford Systems is committed to supporting education and research in universities worldwide and offers special packaging and pricing.

    We also offer academics cost-free access to our tutorial materials for classroom use.

     

    [K#562:1504]

  • SPM 7 Product Versions

    Ultra
    The best of the best. For the modeler who must have access to leading edge technology available and fastest run times including major advances in ensemble modeling, interaction detection and automation. ULTRA also provides advance access to new features as they become available in frequent upgrades.
    ProEx
    For the modeler who needs cutting-edge data mining technology, including extensive automation of workflows typical for experienced data analysts and dozens of extensions to the Salford data mining engines.
    Pro
    A true predictive modeling workbench designed for the professional data miner. Variety of supporting conventional statistical modeling tools, programming language, reporting services, and a modest selection of workflow automation options.
    Basic
    Literally the basics. Salford Systems award winning data mining engines without extensions or automation or surrounding statistical services, programming language, and sophisticated reporting. Designed for small budgets while still delivering our world famous engines

    [K#523:1504]

  • Click on title to open slide
    • Introduction to TreeNet
      by: Mikhail Golovnya, Salford-Systems

      Introduction to TreeNet

       
      ...
       
    • Five Part Videos

      Training In TreeNet Part 1

       
      ...
       

      Training In TreeNet Part 2

       
      ...
       

      Training In TreeNet Part 3

       
      ...
       

      Training In TreeNet Part 4

       
      ...
       

      Training In TreeNet Part 5

       
      ...
       
     

    [k2#559:1504]

  • A user's license sets a limit on the amount of learn sample data that can be analyzed. The learn sample is the data used to build the model. Note that there is no limit to the number of test sample data points that may be analyzed. In other words, rows -by- columns of variables and observations used to build the model. Variable not used in the model do not count. Observations reserved for testing, or excluded for other reasons, do not count.

    For example, suppose our 32MB version that sets a learn sample limitation of 8 MB. Each data point occupies 4 bytes. For instance, a 8MB capacity license will allow up to 8 * 1024 * 1024 / 4 = 2,097,152 learn sample data points to be analyzed.A data point is a represented by a 1-variable by- 1-observation (1-row by-1-column).

    The following is a table that describes the current set of "sizes" available. Please note that the minimum required RAM is **not** the same as the learn sample limitation.

    Size Data Limit MB Data Limit # of values  
    minimum required
    physical memory
    (RAM) in MB
    Licensed learn sample
    data sizein MB 
    (1 MB = 1,048,576 bytes)
    Licensed # of learn
    sample values
    (rows by columns)
     
    32 8 2,097,152  
    64 18 4,718,592  
    128 45 11,796,480  
    256 100 26,214,400  
    512 200 52,428,800  
    1024 400 104,857,600  
    2048 800 209,715,200 **64-bit only
    3072 1200 324,572,800 **64-bit only

    Additional larger capacity is available under 64-bit operating systems, using our non-GUI (command-line) builds. The non-GUI is very flexible and can be licensed for large data limits not currently available in the GUI product line. The current MAXIMUM is 8-GIG data capacity for our non-GUI builds.

    [K#561:1504]

  • Broadband propensity project: comparing TreeNet with Enterprise Miner using logistic regression.

    We’re seeing these benefits
    1. TreeNet (Stochastic gradient boosting) method injects randomization to the selection of candidate predictors and training data, making this method much more robust than the traditional statistical models especially in dealing with messy data. For example, in our dataset, there is a part of information missing like customer’s portfolio and usage data. Although we do the data replacement for the statistical models, it would still affect the final results as it uses the whole dataset for training. By using TreeNet, only a subset of data and predictors are used each time and this process will be repeated for hundreds of time. This method greatly reduced the influence of messy data and improves the robustness of the final model. In terms of the nature of modeling, growing a large number of small trees instead of using a single complex tree has been proved to be more accurate and robust.
    Our modeling dataset is always of big size. In this example, data size is above 500,000 and the initial predictor set is about 160 predictors. TreeNet is computational efficient and scalable for the large dataset which is much faster than the enterprise miner. In the result analysis, the detailed relationships between predictors and the target are much easier to be visualized. Battery automates the process of running multiple experiments which reduces a lot of efforts in the predictor selection. In this example, 16 predictors are finally selected after 5 cycles and 2 battery processes.
    2. Insights: TreeNet can dig out very granular information. For example, it helps to find the impact of a specific sector of predictors. In this example, predictors regarding to product holding and usage are emphasized in the TreeNet model while they are not prominent in the traditional logistic regression model. Information in these sectors contributed a lot in improving the prediction of customer’s propensity in buying our fix broadband products.
    3. We’re seeing various levels of performance gains over traditional statistical models. For the best performance we’ve seen, the improvement of Lift is consistently about 40%, helping to capture more than 30% customers who are willing to buy our product.

    Predictive Analytics Manager at Leading Telco in Singapore.
    **Her team works on scientific marketing initiatives using statistics, data science and optimization methodologies.


    David Vogel, CEO Voloridge Investment Management and Captain of the winning Heritage Health prize team

    I have multiple versions of gradient boosting I could find including popular open source versions and TreeNet outperforms them all in predictive accuracy (consistently across many different kinds of data sets) while maintaining the ability to train models quickly.

    David Vogel, CEO Voloridge Investment Management and Captain of the winning Heritage Health prize team
    Florida, USA


     

    Brad Turner, Vice President of Marketing and Business Development, Inkiru

    Everyday, the Inkiru product predicts sales for 2000 items in an e-commerce context. In addition, the product generates a customized confidence interval for each prediction. The input is dynamic and it consists of 1 year of historical data. Each record contains approximately 150 features with information about sales, products, customers, and promotions.

    The problem was very challenging from a modeling point of view. Important parts of the data were continuous, categorical, highly non-linear, sparse, missing, and noisy. We found Salford Systems adequate to deal with these characteristics of the data.

    Precision was an important goal in this project. A validation with real data reports 90% of the predictions lying within 7 units of the actual sales and 50% within 2 units. Salford Systems was definitely an important tool to reach this degree of accuracy in the product.

     Brad Turner, Vice President of Marketing and Business Development, Inkiru
    California, USA


    Andrew Russo, Vice President, Modeling and Analytics at AccuData Integrated

    As a traditional modeler, I had been primarily using regression and logistic regressions. I began to test TreeNet last fall. Since then I have built several models that are now market-tested and are performing as predicted by TreeNet. The real value of TreeNet has been the speed in which it builds data, the accuracy of its predictions and the incremental lift it is experiencing in side-by-side tests of regressions. It has also proven to be a tremendous data prep time saver in its ability to deal with outliers, missing data as well as doing a decent job distinguishing between scale and categorical data. Importantly, the ability for less-hands-on model builds has enabled us to offer new modeling products to our clients that otherwise would not have had the budget to do a modeling project. In short this new, advanced capability is giving my company a competitive advantage.

     Andrew Russo, Vice President, Modeling and Analytics at AccuData Integrated Marketing
    Florida, USA


    Tom Osborn, Adjunct Professor at University of Technology

    I've used TreeNet on commercial projects since '04. For customer and prospect targeting, it outperforms logistic family regression, neural nets and other methods in my kitbag. Key strengths: handling of missing values, robustness, general non-linearity, variable interactions. Clients like feedback on variable importance (more general than Shapley or PMVD). They also like seeing how the variable contribute to predictions. Fast and easy to use. Best - is developed on Jerry Friedman's great maths.

     Tom Osborn, Adjunct Professor (analytics/data mining) at University of Technology, Sydney
    Sydney, Australia


    [k#646:1504]

[K#573:1504]