Download Now! Free 30 Day Trial of Salford System's Predictive Modeling Suite

Upcoming Tradeshows

  • JSM
    July 28, 2012 - August 02, 2012
    San Diego, CA, Booth TBA
  • KDD
    August 12, 2012 - August 16, 2012
    Beijing, China, Booth TBA
  • Statistical Learning and Data Mining III
    October 01, 2012
    Boston, MA
  • DMA
    October 13, 2012 - October 19, 2012
    Las Vegas, NV
  • INFORMS
    October 14, 2012 - October 16, 2012
    Phoenix, AZ
View full calendar
Thursday, October 08 2009 11:10

CART and the Best-Seller Blink

Blink, a New York Times best-selling non-fiction book by Malcolm Gladwell, is a fascinating investigation into a person's ability to make instant judgments and decisions, as well as into the factors that help or hurt the judgment process. In Chapter 4, Gladwell discusses the problem of diagnosing individuals who arrive at a hospital emergency department with chest pain. Doctors need to quickly determine whether the patient is having a heart attack or is at risk of having a heart attack soon. The author discusses a computer-driven protocol based on research by Lee Goldman that, in a controlled experiment at Cook County Hospital in 2000- 2001, was proven to be dramatically more accurate than the judgment of physicians.

The footnotes and references at the back of the book show that such a computer-driven diagnostic protocol was first developed in a classic paper written in 1982 by Lee Goldman and CART author Richard Olshen, now widely assigned reading in medical schools. This landmark paper was updated by Goldman several times between 1982 and 1996 using CART on larger data sets drawn from several hospitals. The results of the trial at Cook County Hospital were published in the Journal of the American Medical Association in 2002.

This example clearly shows that CART used in a real world setting may easily outperform the judgment of experts. Part of the appeal of CART in this circumstance is the ease with which the diagnostic logic can be followed by non-statisticians. This study is also an example of how CART is having a behind-the-scenes impact on processes that touch many people in their day-to-day activities.

You can request the latest version of CART at This e-mail address is being protected from spambots. You need JavaScript enabled to view it. .

Published in Company
Monday, October 05 2009 14:54

Black Boxes and Data Mining Systems

The concept of a "black box" is used to describe a situation in which a scientist endeavors to learn as much as possible about an entity or physical system, but is limited in the type of information that can be obtained. Traditionally, only behavior is observed, with no way of knowing the probable mechanisms that determine the behavior. For most of its history, for example, psychology was limited to studying the brain as a black box because it was virtually impossible to peer inside to learn how the brain actually functions.

In the world of data mining and predictive modeling, the concept of the black box often comes up in the context of proprietary prediction systems in which the vendor does not disclose details of the algorithm by which the predictions are being made. In the 1990's, many financial institutions paid hefty fees to use a proprietary system for predicting interest rates; the vendor was successful in persuading banks that the predictions were accurate enough to warrant subscribing to the service even though the banks did not know how the predictions were generated.

Today, in the field of data mining and predictive modeling software, there are new black box vendors who prefer to offer the most minimal descriptions of their algorithms. Instead of describing their own algorithms in detail, they offer general discussions of data mining principles and pepper their white papers with formulas for well-known procedures such as logistic regression and ROC calculation.

The topic addressed in this blog is: Should you seriously consider such a black box system? In general, we think not for the following reasons:

  1. One plausible justification for using a black box predictive system is that it outperforms other non-black box systems. To the best of our knowledge, no black box system has succeeded in outperforming systems such as TreeNet or other Salford Systems technologies.
  2. Another justification that has been offered for black box systems is that they offer a high degree of automation coupled with good, if not the best, performance. In other words, you may obtain an "easy button"; just point, click, and wait for the models to appear automatically. We will offer a detailed discussion of this topic in another series of blog entries, but our current take is that such total "lights out" automation is largely marketing hype. In contrast, while considerable sensible automation is available within the Salford suite of tools, in addition to superior performance, these tools also come with detailed explanations regarding their inner working. So why go with mystery tools that offer far less in substance?
  3. We have often suspected that black box systems for data mining are actually rather simple mechanisms. The vendors may endeavor to keep the details secret because they would find it impossible to obtain their high licensing fees from people who understood what the system was actually doing. By creating an aura of mystery around their simple mechanisms, these vendors hope to persuade wishful thinkers that a "silver bullet" solution to their modeling needs is at hand.
  4. Many circumstances exist in which it is vital to be able to explain in detail how certain predictive models were developed from the training data. For example, regulators such as the FDA (Food and Drug Administration) are not going to accept the results of a data analysis if the method of analysis is not disclosed. Marketers are generally keenly interested in understanding how data is used to extract insight into customer behavior, and banking regulators insist on total transparency of any credit risk model. For such consumers of models, adequate explanations of the workings of the modeling mechanism must be provided.
  5. Modeling systems frequently require tweaking the nature of the data, as well as their quality, volume, or breadth and change over time. Using a system that is both understood and understandable puts the user in a position of modifying control parameters intelligently so as to obtain better results over time. With black box technology, on the other hand, the user is always dependent on the vendor to make these adjustments, if indeed they are even possible to make.

So, in conclusion, we believe consumers must come down on the side of knowing, at a minimum, the key concepts behind the modeling system, as well as sufficient technical detail to be able to understand how and why the control parameters will affect modeling results.

If you would like to take a closer a look at Salford predictive modeling tools, evaluation versions are available at http://salford-systems.com/products.php.

Published in Company
Friday, September 11 2009 13:25

2009 Data Mining Conference

Salford Systems' 6th International Applied Data Mining Conference, a user-oriented data mining and predictive analytics conference, was held in San Diego on August 23rd through August 25th, 2009, hosting over 100 people and offering 32 presentations across multiple tracks. Topics included what went wrong in the financial markets, best practice analytics in banking and insurance underwriting, fraud detection, discovering unexploded ordinance in minefields, various topics in healthcare and bioinformatics, predictive analytics for optimal placement of web advertisements in an ad network, genetics research, and techniques for building better models.

We were also honored to have scientific thought leaders Jerome Friedman and Richard Olshen presenting summaries of their most recent research. Jerry Friedman spoke about his Generalized PathSeeker approach to regularized regression; this technology offers high speed LASSO-style regression for extreme data set configurations with upwards of 100,000 predictors and possibly very few rows. Such data sets are commonplace in gene research and text mining and the new technology is both supremely fast and efficient. (GPS is currently available in limited release versions of Salford predictive analytics software.)

The agenda for the conference can be viewed at: http://www.salforddatamining.com/agenda.php. If you are interested in attending an online replay of the conference please contact us at This e-mail address is being protected from spambots. You need JavaScript enabled to view it. . We will be offering video recordings of the conference sometime in October.

Published in Company
Thursday, September 03 2009 14:11

Welcome

With the launch of our new web site in September 2009, we were also pleased to launch our new blog. Here you can benefit from the accumulated experience of the Salford analytics team, reflecting what we have learned over the past 20 years of consulting and predictive analytics software development.

The Salford Predictive Analytics, Data Mining and Business Intelligence NewsLetter/Blog contains company news, analytics advice, real world examples using our technology, tech support topics, technical discussions and summaries of interesting papers (especially those presented at our user conferences). In addition, we will provide information of interest to followers of CART®, MARS®, TreeNet®, PRIM™, RandomForests® PATHSEEKER™ and other new technology we are in the process of developing.

We will be focused on practical matters and hope to help predictive analysts, data miners and decision makers in day-to-day tasks. We will occasionally include editorial material reflecting our views on current developments in the market and the practice of data mining.

If you want an email reminder, sign up here.

Published in Company
Tuesday, December 14 2010 12:23

SPM Suite Unveiled at NCDM 2010

MIAMI -- Salford Systems, the authority in data mining and predictive analytics software, unveiled its new Salford Predictive Modeler (SPM)™ software suite at NCDM 2010 here today. SPM provides businesses, institutions and government agencies with a highly accurate, ultra-fast platform for developing predictive, descriptive and analytical models from large and complex databases. SPM technology dramatically accelerates accurate, robust model generation by automatically sifting through such databases to isolate significant patterns and relationships. Yet the program is easy to use for both technical and nontechnical users.

Published in News

Try Now! Free 30-Day Trial

Salford Predictive Modeling Suite (SPM) includes CART, MARS, TreeNet, and RandomForests, and powerful new automation and modeling capabilities not found elsewhere.

POWERFUL analytics you can trust

  • SPM introduces unique automation techniques.
  • SPM allows the modeler to customize the automation throughout the model-building process.
  • SPM facilitates the use of data mining and predictive analytics by non-experts and gives unprecedented power for experts.

This e-mail address is being protected from spambots. You need JavaScript enabled to view it. Find out how you can use SPM technology in ways that are core and critical to your analytics challenges.
Published in News

SAN DIEGO – Salford Systems, a pioneer in developing data mining and predictive analytics software, has once again provided the winning technology in a major competitive analytics event, this time at the 2010 Direct Marketing Association (DMA) Analytic Challenge, sponsored by the CAC Group, Inc.

Published in News

SAN DIEGO – Data mining technology allows sports teams to find new indicators to measure player performance while helping them gain insight into athletes’ future success, asserted Mikhail Golovnya, Salford Systems’ senior scientist, during his presentation at the MIT Sloan Sports Analytics Conference in Boston last week.

Published in News

SAN DIEGO - Dr. Falk Huettmann, a wildlife ecologist and professor at the University of Alaska-Fairbanks, has written a report entitled Future of Alaska in which he forecasts how climate change, human activities, natural disasters and cataclysmic events might affect Alaska’s ecosystem over the next 100 years.

Published in News

SAN DIEGO – Salford Systems announces its 2012 Analytics and Data Mining Conference with the launch of its new conference website. The conference will be held in San Diego, Calif., May 24-25, 2012.

Published in News
<< Start < Prev 1 > >>
Page 1 of 2