Download Now! Free 30 Day Trial of Salford System's Predictive Modeling Suite

Upcoming Tradeshows

  • JSM
    July 28, 2012 - August 02, 2012
    San Diego, CA, Booth TBA
  • KDD
    August 12, 2012 - August 16, 2012
    Beijing, China, Booth TBA
  • Statistical Learning and Data Mining III
    October 01, 2012
    Boston, MA
  • DMA
    October 13, 2012 - October 19, 2012
    Las Vegas, NV
  • INFORMS
    October 14, 2012 - October 16, 2012
    Phoenix, AZ
View full calendar
Dan Steinberg

Dan Steinberg

Dan Steinberg, President and Founder of Salford Systems, is a well-respected member of the statistics and econometrics communities. In 1992, he developed the first PC-based implementation of the original CART procedure, working in concert with Leo Breiman, Richard Olshen, Charles Stone and Jerome Friedman. In addition, he has provided consulting services on a number of biomedical and market research projects, which have sparked further innovations in the CART program and methodology.

Dr. Steinberg received his Ph.D. in Economics from Harvard University, and has given full day presentations on data mining for the American Marketing Association, the Direct Marketing Association and the American Statistical Association. A book he co-authored on Classification and Regression Trees was awarded the 1999 Nikkei Quality Control Literature Prize in Japan for excellence in statistical literature promoting the improvement of industrial quality control and management.

Website URL:

You can use CART itself to do this via the built-in SCORE facility. If you use the GUI you access the SCORE dialog via the ToolBar icon to the etxreme right, or from the Model menu item.

Scoring any data set will produce one output record for each input record along with the CART prediction (RESPONSE) and the node number of the terminal node for that record (NODE). You can then SELECT the relevant records from the saved data set in subsequent analyses. The built-in BASIC can be used to delete data for NODE values you are not interested in, but this requires that you first SAVE the scored data set.

Thursday, March 22 2012 11:49

Are Interactions Relevant To Your Data?

There are several stages to interaction detection using Treenet models.  The first stage is to run a simple comparison of test sample performance for TreeNet models run with trees of different sizes. The baseline model would be the Treenet using 2-node trees (sometimes known as "stumps"). The core idea is that a tree grown with a single split cannot reflect any kind of interaction as the entire story for the tree involves a single variable and by definition an interaction requires at a minimum two different variables. The 2-node tree baseline model thus represents the best possible model TreeNet can grow when interactions are prevented. We then grow at least one more tree allowing more than 2-nodes, which thus allows interactions. The simple story is that if the 2-node TreeNet is as good, or almost as good as the larger tree model, then we have compelling data-based evidence that interactions are irrelevant to the data generation process (how the real world actually operates to produce this data).

Once you have built an SPM model (CART, MARS, TreeNet, RandomForests) and have saved the grove (.GRV) file you are in a position to make predictions for any other data set containing relevant predictors. Thus, if you trained your model on file A using variables X1, X2,...,X50, for example, you can now predictions for file B, provided that file B contains at least some of the same variables (and preferably all of the variables actually used in the model).

This process of prediction generation is called SCORING in our software and most models are built specifically so that they can be put into production to generate predictions. The process can also be used for SIMULATION. In this case you prepare a data set which will also contain the columns X1, X2, ...,X50 but the values appearing may not necessarily be real data. Instead the file could contain hypothesized or imagined values, or forecasted values, as in the case when you want to make predictions for certain possible future scenarios.

Wednesday, March 07 2012 11:52

Using Battery PRIORS in CART

Understand the value of PRIORS EQUAL and PRIORS DATA in common classification problems in CART.

If you open a saved grove for any any Salford Systems data mining engine (CART, MARS, TreeNet, RandomForests)  you will notice a “Commands” button among a row of controls along the bottom of the display. The Commands button will open a plain text window displaying all the commands entered in your session up until the run that generated the grove.

Wednesday, February 22 2012 16:35

Train/Test Consistency in CART

Take a close look at CART to see the advantages of using train and test data when building your predictive models.
Monday, February 20 2012 15:02

A Reminder About Missing Values

Our tech support department receives a steady stream of interesting questions regarding how to use our products, with questions about specific features or how to accomplish a given task. We also receive questions about data mining (and predictive analytics generally), modeling strategy and a variety of other topics. One type of query that comes up periodically is what to do with missing values. We have spoken before about missing values in a variety of contexts, but usually at a fairly technical and advanced level. Today’s post is actually quite basic in nature and is in response to a user’s question about what to do with special values for variables that are intended to represent missing values. Data input practice stemming from at least the 1970's has made ‘missing value codes’ for unknown data fields; favorite values have include a string of 9’s such as 9999 or -9999. There are a number of variations on this theme. For example, survey research firms have wanted to distinguish between different reasons for a missing value using, for example, 9999 to represent values missing for no known reason and 9998 representing ‘unknown’ and 9997 for ‘refused.’ Data input clerks have been known to fill in missing birthdays with values such as January 1, 1960.

Wednesday, February 15 2012 10:52

An Introduction To Cross-Validation

Learn how to prepare for and utilized cross-validation to test the accuracy of your results
How to get started with Salford's Online Training Series
Learn to address the challenge of testing small training data sets and improve the reliability of results using Battery Cross-Validation (CVR).
<< Start < Prev 1 3 4 5 > End >>
Page 1 of 5