On Demand Introductory Videos
Download Now Instant Evaluation
Get Price Quote

CART Classification And Regression Trees®

  • Classification and Regression Trees

    Ultimate Classification Tree:
    CART is the ultimate classification tree that has revolutionized the entire field of advanced analytics and inaugurated the current era of data mining. CART, which is continually being improved, is one of the most important tools in modern data mining. Others have tried to copy CART but no one has succeeded as evidenced by unmatched accuracy, performance, feature set, built-in automation and ease of use. Designed for both non-technical and technical users, CART can quickly reveal important data relationships that could remain hidden using other analytical tools.
    Proprietary Code:
    Technically, CART is based on landmark mathematical theory introduced in 1984 by four world-renowned statisticians at Stanford University and the University of California at Berkeley. Salford Systems' implementation of CART is the only decision tree software embodying the original proprietary code. The CART creators continue to collaborate with Salford Systems to continually enhance CART with proprietary advances.
    Fast and Versatile:
    Patented extensions to CART are specifically designed to enhance results for market research and web analytics. CART supports high-speed deployment, allowing Salford models to predict and score in real time on a massive scale. Over the years CART has become known as the fastest and most versatile predictive modeling algorithm available to analyst, it is also used as a foundation to many modern data mining approaches based on bagging and boosting.

     

     

    [K#512:1504]

  • CART Features available in Basic, Pro, ProEx, and Ultra.


    ComponentsBasicProProExUltra
    ComponentsBasicProProExUltra
    Modeling Engine:
    CART (Decision Trees)
    o o o o
    Linear Combination Splits o o o o
    Optimal tree selection based on area under ROC curve o o o o
    User defined splits for the root node and its children   o o o
    Automation: Generate models with alternative handling of missing values (Battery MVI)   o o o
    Automation: RULES: build a model using each splitting rule (six for classification, two for regression).   o o o
    Automation: Build a series of models using all available splitting strategies (six for classification, two for regression) (Battery RULES)   o o o
    Automation: Build a series of models varying the depth of the tree (Battery DEPTH)   o o o
    Automation: Build a series of models changing the minimum required size on parent nodes (Battery ATOM)   o o o
    Automation: Build a series of models changing the minimum required size on child nodes (Battery MINCHILD)   o o o
    Automation: Explore accuracy versus speed trade-off due to potential sampling of records at each node in a tree (Battery SUBSAMPLE)   o o o
    Multiple user defined lists for linear combinations     o o
    Constrained trees     o o
    Ability to create and save dummy variables for every node in the tree during scoring     o o
    Report basic stats on any variable of user choice at every node in the tree     o o
    Comparison of learn vs. test performance at every node of every tree in the sequence     o o
    Hot-Spot detection to identify the richest nodes across multiple trees     o o
    Automation: Vary the priors for the specified class (Battery PRIORS)     o o
    Automation: Build a series of models limiting the number of nodes in a tree (Battery NODES)     o o
    Automation: Build a series of models trying each available predictor as the root node splitter (Battery ROOT)     o o
    Automation: Explore the impact of favoring equal sized child nodes (Battery POWER)     o o
    Automation: Vary the priors for the specified class (Battery PRIORS)     o o
    Automation: Build a series of models by progressively removing misclassified records thus increasing the robustness of trees and posssibly reducing model complexity (Battery REFINE)     o o
    Automation: Bagging and ARCing using the legacy code (COMBINE)     o o
    Build a CART tree utilizing the TreeNet engine to gain speed as well as alternative reporting       o
    Build a Random Forests model utlizing the CART engine to gain alternative handling of missing values via surrogate splits (Battery BOOTSTRAP RSPLIT)       o

     additional cart features

    [K#513:1504]

  • Click on title to open slide

    Introduction to CART

    Introduction to CART
    By: Mikhail Golovnya, Salford Systems.

    Introduction to CART

     
    ...
     

    Training in CART

    Six Part Video Presentation

    Training In CART Part 1

     
    ...
     

    Training In CART Part 2

     
    ...
     

    Training In CART Part 3

     
    ...
     

    Training In CART Part 4

     
    ...
     

    Training In CART Part 5

     
    ...
     

    Training In CART Part 6

     
    ...
     

    Training in Advanced CART

    Multi-part video presentation

    Training In Advanced CART Part 1

     
    ...
     

    Training In Advanced CART Part 2

     
    ...
     

    Training In Advanced CART Part 3

     
    ...
     
     

    [K#627:1504]

  •  

    [K#517:1504]

  • Salford Systems' University Program provides CART at significantly reduced licensing fees to the educational community. Eligible educational institutions are colleges, universities, community colleges, technical schools, and science centers. Additionally, a 90-day free evaluation is available upon request.

    The University Program gives eligible educational institutions the right to distribute CART and other Salford tools right-to-use licenses to all faculty, staff, and students for personal computers, and to install UNIX versions of these tools on University workstations and servers. For more information on this special program, please contact our sales department.

     

    Salford Systems is committed to supporting education and research in universities worldwide and offers special packaging and pricing.

    We also offer academics cost-free access to our tutorial materials for classroom use.

     

    [K#516:1504]

  • SPM 7 Product Versions

    Ultra
    The best of the best. For the modeler who must have access to leading edge technology available and fastest run times including major advances in ensemble modeling, interaction detection and automation. ULTRA also provides advance access to new features as they become available in frequent upgrades.
    ProEx
    For the modeler who needs cutting-edge data mining technology, including extensive automation of workflows typical for experienced data analysts and dozens of extensions to the Salford data mining engines.
    Pro
    A true predictive modeling workbench designed for the professional data miner. Variety of supporting conventional statistical modeling tools, programming language, reporting services, and a modest selection of workflow automation options.
    Basic
    Literally the basics. Salford Systems award winning data mining engines without extensions or automation or surrounding statistical services, programming language, and sophisticated reporting. Designed for small budgets while still delivering our world famous engines

    [K#523:1504]

    • We suggest the following minimum and recommended, system requirements:

      • 80486 processor or higher.
      • 512MB of random-access memory (RAM). This value depends on the "size" you have purchased (64MB, 128MB, 256MB, 512MB, 1GIG). While all versions may run with a minimum of 32MB of RAM, we CANNOT GUARANTEE they will. We highly recommend that you follow the recommended memory configuration that applies to the particular version you have purchased. Using less than the recommended memory configuration results in hard drive paging, reducing performance significantly, or application instability.
      • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
      • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).
      • CD-ROM or DVD drive.

      Recommended System Requirements

      Because Salford Tools are extremely CPU intensive, the faster your CPU, the faster they will run. For optimal performance, we strongly recommend they run on a machine with a system configuration equal to, or greater than, the following:

      • Pentium 4 processor running 2.0+ GHz.
      • 2 GIG of random-access memory (RAM). This value depends on the "size" you have purchased (64MB, 128MB, 256MB, 512MB, 1GIG). While all versions may run with a minimum of 32MB of RAM, we CANNOT GUARANTEE they will. We highly recommend that you follow the recommended memory configuration that applies to the particular version you have purchased. Using less than the recommended memory configuration results in hard drive paging, reducing performance significantly, or application instability.
      • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
      • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).
      • CD-ROM or DVD drive.
      • 2 GIG of additional hard disk space available for virtual memory and temporary files.

      Ensuring Proper Permissions

      If you are installing on a machine that uses security permissions, please read the following note.

      • You must belong to the Administrator group on Windows 2003 / 2008, Windows 7 / 8 to be able to properly install and license. Once the application is installed and licensed, any member with read/write/modify permissions to the applications /bin and temp directories can execute and run the application.
    • Supported Architectures

      • Alpha: DEC 3000 or AlphaServer running Tru64 UNIX 5.0 or higher
      • Linux/i386: i586 or higher processor; Linux 2.4 or higher kernel; glibc 2.3 or higher
      • Linux/AMD64: AMD64 or Intel EM64T processor; Linux 2.6 or higher kernel; glibc 2.3 or higher
      • Sun: UltraSPARC processor; Solaris 2.6 or higher
      • RS/6000: POWER or PowerPC processor; AIX 4.2 or higher
      • HP 9000: PA/RISC 1.1 or higher processor; HP/UX 11.x
      • SGI: MIPS 4 or higher processor; IRIX 6.5

      Minimum System Requirements

      • Minimum RAM requirement for all non-GUI app's is 32 MB of random-access memory (RAM). This value depends on the "size"
        you have purchased (64MB, 128MB, 256MB, 512MB, 1GIG).
      • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
      • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).

      Recommended System Requirements

      • Recommended random-access memory (RAM) is 1.5 times the licensed data limit (32 MB, 64 MB, etc), up to the maximum permitted by the target architecture. On UNIX systems, it is generally recommended that there be at least twice as much swap space as there is RAM.
      • Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
      • Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).

      All Salford apps are very CPU intensive, so more memory and a faster CPU are always helpful.

    Licensing Application

    TreeNet uses a system of application system ID and associated unlock key. When installation is complete, the user will need to email the application "system ID." This system ID is clearly displayed in the License Information displayed the first time the application is started. You can alternatively get to this window by selecting the Help->License menu option.

    Method 1: Fixed License
    With a fixed license, each machine must have its own copy of the licensed program installed. If your license terms permit more than one copy, then the license must be activated on each machine that will be used.

    Method 2: Floating License
    This method of licensing your program is used if you intend the program application to be used by more than one user concurrently over a network. A floating license tracks the number of copies "checked out." When that number exceeds your license terms, a message is provided informing the user "all copies are checked out." The licensed program may be installed on a machine that each client machine can access. Machines that are not connected to the network must be issued a fixed license (Method 1 above).

    A floating license is particularly useful when the number of potential users exceeds the number of seats specified in your license terms.

    [K#740:1504]

  • A user's license sets a limit on the amount of learn sample data that can be analyzed. The learn sample is the data used to build the model. Note that there is no limit to the number of test sample data points that may be analyzed. In other words, rows -by- columns of variables and observations used to build the model. Variable not used in the model do not count. Observations reserved for testing, or excluded for other reasons, do not count.

    For example, suppose our 32MB version that sets a learn sample limitation of 8 MB. Each data point occupies 4 bytes. For instance, a 8MB capacity license will allow up to 8 * 1024 * 1024 / 4 = 2,097,152 learn sample data points to be analyzed.A data point is a represented by a 1-variable by- 1-observation (1-row by-1-column).

    The following is a table that describes the current set of "sizes" available. Please note that the minimum required RAM is **not** the same as the learn sample limitation.

    Size Data Limit MB Data Limit # of values  
    minimum required
    physical memory
    (RAM) in MB
    Licensed learn sample
    data sizein MB 
    (1 MB = 1,048,576 bytes)
    Licensed # of learn
    sample values
    (rows by columns)
     
    32 8 2,097,152  
    64 18 4,718,592  
    128 45 11,796,480  
    256 100 26,214,400  
    512 200 52,428,800  
    1024 400 104,857,600  
    2048 800 209,715,200 **64-bit only
    3072 1200 324,572,800 **64-bit only

    Additional larger capacity is available under 64-bit operating systems, using our non-GUI (command-line) builds. The non-GUI is very flexible and can be licensed for large data limits not currently available in the GUI product line. The current MAXIMUM is 8-GIG data capacity for our non-GUI builds.

    [K#515:1504]


[K#511:1504]