Blink, a New York Times best-selling non-fiction book by Malcolm Gladwell, is a fascinating investigation into a person's ability to make instant judgments and decisions, as well as into the factors that help or hurt the judgment process. In Chapter 4, Gladwell discusses the problem of diagnosing individuals who arrive at a hospital emergency department with chest pain. Doctors need to quickly determine whether the patient is having a heart attack or is at risk of having a heart attack soon. The author discusses a computer-driven protocol based on research by Lee Goldman that, in a controlled experiment at Cook County Hospital in 2000- 2001, was proven to be dramatically more accurate than the judgment of physicians.
The footnotes and references at the back of the book show that such a computer-driven diagnostic protocol was first developed in a classic paper written in 1982 by Lee Goldman and CART author Richard Olshen, now widely assigned reading in medical schools. This landmark paper was updated by Goldman several times between 1982 and 1996 using CART on larger data sets drawn from several hospitals. The results of the trial at Cook County Hospital were published in the Journal of the American Medical Association in 2002.
This example clearly shows that CART used in a real world setting may easily outperform the judgment of experts. Part of the appeal of CART in this circumstance is the ease with which the diagnostic logic can be followed by non-statisticians. This study is also an example of how CART is having a behind-the-scenes impact on processes that touch many people in their day-to-day activities.
You can request the latest version of CART at This e-mail address is being protected from spambots. You need JavaScript enabled to view it. .
The concept of a "black box" is used to describe a situation in which a scientist endeavors to learn as much as possible about an entity or physical system, but is limited in the type of information that can be obtained. Traditionally, only behavior is observed, with no way of knowing the probable mechanisms that determine the behavior. For most of its history, for example, psychology was limited to studying the brain as a black box because it was virtually impossible to peer inside to learn how the brain actually functions.
In the world of data mining and predictive modeling, the concept of the black box often comes up in the context of proprietary prediction systems in which the vendor does not disclose details of the algorithm by which the predictions are being made. In the 1990's, many financial institutions paid hefty fees to use a proprietary system for predicting interest rates; the vendor was successful in persuading banks that the predictions were accurate enough to warrant subscribing to the service even though the banks did not know how the predictions were generated.
Today, in the field of data mining and predictive modeling software, there are new black box vendors who prefer to offer the most minimal descriptions of their algorithms. Instead of describing their own algorithms in detail, they offer general discussions of data mining principles and pepper their white papers with formulas for well-known procedures such as logistic regression and ROC calculation.
The topic addressed in this blog is: Should you seriously consider such a black box system? In general, we think not for the following reasons:
So, in conclusion, we believe consumers must come down on the side of knowing, at a minimum, the key concepts behind the modeling system, as well as sufficient technical detail to be able to understand how and why the control parameters will affect modeling results.
If you would like to take a closer a look at Salford predictive modeling tools, evaluation versions are available at http://salford-systems.com/products.php.
Salford Systems' 6th International Applied Data Mining Conference, a user-oriented data mining and predictive analytics conference, was held in San Diego on August 23rd through August 25th, 2009, hosting over 100 people and offering 32 presentations across multiple tracks. Topics included what went wrong in the financial markets, best practice analytics in banking and insurance underwriting, fraud detection, discovering unexploded ordinance in minefields, various topics in healthcare and bioinformatics, predictive analytics for optimal placement of web advertisements in an ad network, genetics research, and techniques for building better models.
We were also honored to have scientific thought leaders Jerome Friedman and Richard Olshen presenting summaries of their most recent research. Jerry Friedman spoke about his Generalized PathSeeker approach to regularized regression; this technology offers high speed LASSO-style regression for extreme data set configurations with upwards of 100,000 predictors and possibly very few rows. Such data sets are commonplace in gene research and text mining and the new technology is both supremely fast and efficient. (GPS is currently available in limited release versions of Salford predictive analytics software.)
The agenda for the conference can be viewed at: http://www.salforddatamining.com/agenda.php. If you are interested in attending an online replay of the conference please contact us at This e-mail address is being protected from spambots. You need JavaScript enabled to view it. . We will be offering video recordings of the conference sometime in October.
With the launch of our new web site in September 2009, we were also pleased to launch our new blog. Here you can benefit from the accumulated experience of the Salford analytics team, reflecting what we have learned over the past 20 years of consulting and predictive analytics software development.
The Salford Predictive Analytics, Data Mining and Business Intelligence NewsLetter/Blog contains company news, analytics advice, real world examples using our technology, tech support topics, technical discussions and summaries of interesting papers (especially those presented at our user conferences). In addition, we will provide information of interest to followers of CART®, MARS®, TreeNet®, PRIM™, RandomForests® PATHSEEKER™ and other new technology we are in the process of developing.
We will be focused on practical matters and hope to help predictive analysts, data miners and decision makers in day-to-day tasks. We will occasionally include editorial material reflecting our views on current developments in the market and the practice of data mining.
If you want an email reminder, sign up here.
MIAMI -- Salford Systems, the authority in data mining and predictive analytics software, unveiled its new Salford Predictive Modeler (SPM)™ software suite at NCDM 2010 here today. SPM provides businesses, institutions and government agencies with a highly accurate, ultra-fast platform for developing predictive, descriptive and analytical models from large and complex databases. SPM technology dramatically accelerates accurate, robust model generation by automatically sifting through such databases to isolate significant patterns and relationships. Yet the program is easy to use for both technical and nontechnical users.
Salford Predictive Modeling Suite (SPM) includes CART, MARS, TreeNet, and RandomForests, and powerful new automation and modeling capabilities not found elsewhere.
POWERFUL analytics you can trust
This e-mail address is being protected from spambots. You need JavaScript enabled to view it. Find out how you can use SPM technology in ways that are core and critical to your analytics challenges.
SAN DIEGO – Salford Systems, a pioneer in developing data mining and predictive analytics software, has once again provided the winning technology in a major competitive analytics event, this time at the 2010 Direct Marketing Association (DMA) Analytic Challenge, sponsored by the CAC Group, Inc.
SAN DIEGO – Data mining technology allows sports teams to find new indicators to measure player performance while helping them gain insight into athletes’ future success, asserted Mikhail Golovnya, Salford Systems’ senior scientist, during his presentation at the MIT Sloan Sports Analytics Conference in Boston last week.
SAN DIEGO - Dr. Falk Huettmann, a wildlife ecologist and professor at the University of Alaska-Fairbanks, has written a report entitled Future of Alaska in which he forecasts how climate change, human activities, natural disasters and cataclysmic events might affect Alaska’s ecosystem over the next 100 years.
SAN DIEGO – Salford Systems announces its 2012 Analytics and Data Mining Conference with the launch of its new conference website. The conference will be held in San Diego, Calif., May 24-25, 2012.