On Demand Introductory Videos
Download Now Instant Evaluation
Get Price Quote
  • Home
  • Tags
  • Classification and Regression Trees

CART

Classification and Regression Trees

CART 6.0 ProEX Features

CART 6.0 ProEX Features

CART 6.0 ProEX Download

CART 6.0 ProEX, released in 2008, comes with a huge list of new features that will help analysts work more rapidly and guide their models to the best-performing trees. This is a dramatic upgrade of our flagship product and is drawing rave reviews from our customers. All of the new CART 6.0 ProEX features are explained in detail in our feature matrix (PDF) some highlights are listed below:

Tree Controls

  • Force splitters into nodes
  • Confine select splitters to specific regions of a tree (Structured Tree™)

HotSpot Detector™

  • Search data for ultra-high performance segments.
  • HotspotDetector trees are specifically designed to yield extraordinarily high-lift or high-risk nodes. The process focuses on individual nodes and generally discards the remainder of the tree.

Train/Test Consistency Assessment

  • Node-by-node summaries of agreement between train and test data on both class assignment and rank ordering of the nodes.
  • Quickly identifies ideally-performing robust trees.

Modeling Automation

  • Automatically generates entire collections of trees exploring different control parameters.
  • Nineteen automated batteries cover exploration of multiple splitting rules, five alternative missing value handling strategies, random selection of alternative predictor lists, progressively smaller (or larger) training sample sizes, and much more.

Predictor Refinement

  • Includes stepwise backwards predictor elimination using any of three predictor ranking criteria (lowest variable importance rank, lowest loss of area under the ROC curve, highest variable importance rank).

Model Assessment via Monte Carlo Testing

  • Measures possible overfitting with automated Monte Carlo randomization tests.

Constructed Features

  • New tools for automatic construction of new features (as linear combinations of predictors).
  • Identification of multiple lists of candidates allows precise control over which predictors may be combined into a single new feature.

Unsupervised Learning Mode

  • Uses Breiman's column scrambler to automatically detect potential clusters with no need to scale data, address missing values, or select variables for clustering.

[J#87:1602]

CART Download

Download

 

The SPM Salford Predictive Modeler® software suite is a highly accurate and ultra-fast platform for creating predictive, descriptive, and analytical models from databases of any size, complexity, or organization. The SPM® software suite has automation that accelerates the process of model building by conducting substantial portions of the model exploration and refinement process for the analyst. While the analyst is always in full control, we optionally anticipate the analyst's next best steps and package a complete set of results from alternative modeling strategies for easy review. Do in one day what normally requires a week or more using other systems.

The Salford Predictive Modeler® software suite includes:

CART®:
This definitive classification tree was developed by world-renowned statisticians, including Doctors Jerome Friedman and Leo Breiman. CART is one of the most well-known data mining algorithms and is designed for both non-technical and technical users.
MARS®:
Ideal for users who prefer results in a form similar to traditional regression while capturing essential non–linearities and interactions.
TreeNet®:
TreeNet is Salford's most flexible and powerful data mining tool capable of consistently generating extremely accurate models. It has been responsible for the majority of modeling competition awards and demonstrates remarkable performance. The regression classification algorithm typically generates thousands of small decision trees built in a sequential error correcting process to converge a model.
Random Forests®:
Random Forests's features include prediction, clusters and segment discoveries, anomaly tagging detection and multivariate class description. The method was developed by Leo Breiman and Adele Cutler, both of the University of California, Berkeley.


New Components & Features available in version 8.0!

GPS:
Generalized Path Seeker is Jerry Friedman's approach to regularized regression. This technology offers high-speed lasso for extreme data set configurations with upwards of 100,000 predictors and possibly very few rows. Such sets are commonplace in gene research and text mining. This is both supremely fast and efficient.
RuleLearner:
RuleLearner is a powerful post–processing technique that selects the most influential subset of nodes, thus reducing model complexity. RuleLearner allows the modeler to take advantage of the increased accuracy of very complicated TreeNet and Random Forests models, while still yielding the simplicity of CART models.
[J#180:1603]

[J#91:1605]

CART Supported File Types

CART Supported File Types

The CART® data-translation engine supports data conversions for more than 80 file formats, including popular statistical-analysis packages such as SAS® and SPSS®, databases such as Oracle and Informix, and spreadsheets such as Microsoft Excel and Lotus 1-2-3.

[J#84:1602]

CART Testimonials

CART Testimonials

Adrian Gepp, Australia

Bond University:

The failure of businesses is an enduring and costly concern. Business failure prediction models attempt to provide early warnings to mitigate some of the costs of future failure, if not avoid it altogether. Research has shown that CART (by Salford Systems) is a good choice for building such models.

In research published in a top academic journal in 2010, empirical evidence was presented to suggest that decision-tree techniques are superior predictors of business failure. On the hold-out data, the CART decision trees were found to outperform See5 decision trees and discriminant analysis at predicting business failure.

In peer-reviewed research presented at a 2012 academic conference, CART decision trees were compared with a semi-parametric Cox survival analysis model for predicting corporate financial distress over a variety of misclassification costs and prediction intervals. The results from the hold-out data suggest that CART decision trees are the superior predictors of financial distress. Using a weighted error cost metric, CART models had a lower cost of prediction for all misclassification costs and prediction intervals.
References
*Gepp, A., Kumar, K. & Bhattacharya, S. (2010). Business failure prediction using decision trees.Journal of Forecasting, 29[6]: pp. 536-555.
* Gepp, A. & Kumar, K. (2012). Financial Distress Prediction using Decision Trees and Survival Analysis. Presented at 7th Annual London Business Research Conference, 9-10 July, London.

Adrian Gepp, Bond University, Australia


Dr. Martin Kidd, IMT, South Africa

Government:

As a statistician in the Naval environment, I have been involved in the field of data mining for the past four years. Classification trees have become one of the primary tools with which I extract useful information from large data bases. I have used various different classification tree software, and have found CART to be the superior product. What I find particularly useful are the following:
* The colour codes of the nodes which one can use to pick the most important branches (or rules).
* The relative cost vs number of nodes graph which I always use to select the 'least complicated' with 'low' relative cost.
The Gains chart provides a good graphical view for assessing tree performance.

Dr. Martin Kidd, IMT, South Africa


Steven Li, Senior Manager, Risk Technology, Sears, Roebuck and Co

CART is an important statistical analysis tool that we use to segment our databases and predict risk factors for the Sears Card. The advantage of the decision tree format is that our results are easy to interpret; especially with CART, we are able to see a great deal of detail about each of the nodes, such as the node's misclassification costs, the count of data assigned to that node, and a display of the surrogate values substituted for the node.

 Steven Li, Senior Manager, Risk Technology, Sears, Roebuck and Co


Andrea S. Laliberte, Remote Sensing Scientist at Earthmetrics

I have used CART in conjunction with remote sensing and digital image processing for producing vegetation classifications. CART is an excellent approach for determining the most suitable features (image bands, image ratios, elevation, slope, etc.) for image classification, and for reducing the number of input features to a reasonable number. In comparison with other feature reduction and selection methods, the CART approach has always worked superior for my applications. I really like the intuitive approach, easy to use manual, and the visual interface which makes it easy to interpret the data. In addition, all my interactions with the people at Salford Systems have been wonderful. I highly recommend the software.

 Andrea S. Laliberte, Remote Sensing Scientist at Earthmetrics
Oregon, USA


Anneli Anglund, PhD student at University College Cork

I am a PhD student in the field of marine bioacoustics and while I was looking into analysis methods for my thesis I came across CART. I thought it seemed like an interesting approach and when I tried it I was immediately impressed by the easy to use manual. Even though the examples were not necessarily within my field of study, they made sense and I found it easy to apply the methods to my own data. I would very much like to recommend this software and the very helpful staff of Salford Systems.

 Anneli Anglund, PhD student at University College Cork
Ireland


Chris Gooley, Founder and President at eTs Marketing Science

I've used Salford Systems software products ever since 1991 when Dan Steinberg and his team were first developing Salford tools in conjunction with the pioneering data mining scientists at Stanford and Berkeley.

I am an extensive user of SAS and SPSS software products. However, when it comes to decision trees and highly predictive models, I always to turn to CART and other Salford Systems software products. Not only is the user interface simple to use but writing your own syntax is easy to do as well.

The reasons I like Salford Systems tools and CART specifically include:

  1. The large number of options for tuning the algorithm, including statistical methods, tree depth, minimum node size, and cross validation procedures
  2. Easy to use facilities for building ensemble models via bagging, boosting, and arcing methods
  3. Intuitive, easy to understand metrics such as variable importance that are useful for checking if a model makes “business sense”
  4. Scoring and translating models is very fast and easy
  5. Ease of integration with SAS and SPSS

I can guarantee any analyst that invests a modest amount of their time with Salford tools will
never regret the experience nor go back to using less powerful alternatives!

Chris Gooley, Founder and President at eTs Marketing Science


Dean Abbott, Founder and President at Abbott Analytics/Abbott Consulting

I've used Salford Systems tools for years and have recommended purchase of the suite to many companies I've worked with. Reasons I like it so much include:
* The trees build super fast, even with large numbers of rows and columns
* CART shows you the entire sequence of trees that have been built; you can customize the depth you find most appropriate or let CART decide the optimum depth
* Default settings are great but you can still customize
* Battery options let you loop over key settings

Dean Abbott, Founder and President at Abbott Analytics/Abbott Consulting
San Diego, CA USA


Eric Weiss, Ph.D., Consultant; Arid Lands Resource Sciences, University of Arizona

Academic
As a research scientist in both academic and professional environments, I work with databases too large and complex to process manually. CART, unlike multiple linear programming and other methods that are constrained by functional forms, shows me truer characterizations of interrelationships between the data. CART is also a robust program that can support a diverse set of applications ranging, in my case, from food security analyses to pattern recognition and remote sensing problems.

 Eric Weiss, Ph.D., Consultant; Arid Lands Resource Sciences, University of Arizona


Feng Xu, Senior Manager, AT&T Universal Card Services

Telecommunications:
When we purchased CART, it was the only comprehensive classification and segmentation software available that could handle the large data sets we use for credit card risk management. In addition, CART provides us with a great deal of flexibility by allowing us, for example, to specify a higher penalty for misclassifying a certain data value.

 Feng Xu, Senior Manager, AT&T Universal Card Services


Marsha Wilcox, Ed.D., Vice President, PreVision Marketing

Marketing
PreVision Marketing's clients include Fortune 500 companies from telecommunications, automotive, retail and packaged goods industries. We apply our database marketing and analysis expertise to turn our clients' usual wealth of customer information into beneficial marketing information and customer relationship programs. At PreVision, this typically includes developing models of customer and prospective customer behavior. CART's recursive partitioning abilities give us a proven statistical method for generating marketing models in an easy-to-understand decision tree format. This format is accessible to all of our clients, even those with limited statistical backgrounds, and the clarity of the decision tree display gives our clients added confidence in the validity and utility of the models we create.

 Marsha Wilcox, Ed.D., Vice President, PreVision Marketing


Terence Mak, VP, Lead Analytic Consultant, Fleet Financial Group

Banking/Finance
CART offers two distinctive advantages that other database segmentation tools do not. First, it allows the analyst to identify the smallest target segment possible, such as ten out of tens of thousands, with exceptional precision. In addition, CART allows us to specify a higher penalty for misclassifying a potentially poor prospect than for rejecting a good one; this makes us more confident that, for products with very thin margins, our segmentation models avoid prospects who would likely be non-profitable. CART is an invaluable data mining and modeling tool for Fleet Financial Group.

 Terence Mak, VP, Lead Analytic Consultant, Fleet Financial Group


Wesley Johnston, Chevron Information Technology Co.

Industrial:
At Chevron, we conduct a lot of exploratory work for oil well drilling. Instead of taking many expensive core samples, we can use stet monitoring tools to characterize geographic areas; data capture generates small data sets with variables that are complex and interrelated rather than independent. CART, with its v-fold cross-validation capability, is our tool of choice for analyzing these small, complex data sets.

 Wesley Johnston, Chevron Information Technology Co.


William Burrows, Meteorological Research Scientist, Atmospheric Environment Service

Government:
I use CART to provide Canadian meteorologists with dynamic statistical models for predicting lake effect snowfall, ozone levels and other weather issues that affect Canada. The optimal tree models I create in CART have proven their accuracy many times over when the tree is used with independent data.

 William Burrows, Meteorological Research Scientist, Atmospheric Environment Service


Varun Aggarwal, Assistant Vice President at EXL Service

We at EXL have been using CART and MARS since time immemorial. CART, in particular, has become an integral part of the suite of tools we trust for delivering high-end analytical solutions to our clients. I have been using CART extensively since last 8 years and I personally like it because of its pervasive applicability to all three phases of business analytics – descriptive, predictive and prescriptive. Firstly, CART has always helped me search for hidden or not-so-obvious patterns by slicing and dicing data based on its complex and robust backend algorithms. I strongly recommend it as a data mining tool for descriptive analytics. Secondly, CART based non-parametric predictive models bring lot of diversity in ensemble models. This Salford Systems product has incredibly helped us win accolades in several world-class model building competitions, including the Heritage Health Prize competition 2011-13, KDD Cup 2010 (Educational Data Mining Challenge) and Pacific Asia KDD Cup 2004 (Physics Task). Most importantly, CART is unambiguously one of the most powerful tools in the analytics industry today for identifying action segments, thereby helping us make very useful recommendations to our clients who need partners for providing feasible solutions in the field of prescriptive analytics.
Given the universal application of CART for solving complex business problems, we have developed and introduced a mandatory training module for all new hires from campus.

Varun Aggarwal
Assistant Vice President
EXL Service


[J#90:1602]

Model Deployment

Any CART model can be easily deployed when translated into one of the supported languages (SAS®-compatible, C, Java, and PMML) or into the classic text output. This is critical for using your CART trees in large scale production work.

The decision logic of a CART tree, including the surrogate rules utilized if primary splitting values are missing, is automatically implemented. The resulting source code can be dropped into external applications, thus eliminating errors due to hand coding of decision rules and enabling fast and accurate model deployment.

[J#85:1602]

Get In Touch With Us

Contact Us

9685 Via Excelencia, Suite 208, San Diego, CA 92126
Ph: 619-543-8880
Fax: 619-543-8888
info (at) salford-systems (dot) com