Download Now! Free 30 Day Trial of Salford System's Predictive Modeling Suite

Upcoming Tradeshows

  • JSM
    July 28, 2012 - August 02, 2012
    San Diego, CA, Booth TBA
  • KDD
    August 12, 2012 - August 16, 2012
    Beijing, China, Booth TBA
  • Statistical Learning and Data Mining III
    October 01, 2012
    Boston, MA
  • DMA
    October 13, 2012 - October 19, 2012
    Las Vegas, NV
  • INFORMS
    October 14, 2012 - October 16, 2012
    Phoenix, AZ
View full calendar

In 1995 Leo Breiman was actively experimenting with his first version of the bagger, and that at time I was in constant contact with him via email. In some cases at Salford Systems we implemented ideas of Leo's as we were discussing them with him. At other times we debated certain details and exchanged ideas in a lively give and take. Leo's initial ideas always took as a given that the bagged trees needed to be pruned and he was using 10–fold cross validation to do so. Because this added a substantial computational burden to the process I suggested that he use the OOB (out of bag) data to test and prune each bagged tree. In response, Leo began experimenting with this idea and eventually concluded that the entire training sample (both in–bag and out of bag) should be used to prune each bagged tree. Of course, subsequent research showed that unpruned trees were in fact ideal and thus the topic of using OOB data for pruning trees fell by the wayside. OOB data became very important in Leo”s subsequent work on RandomForests four years later.

The emails here are a selection of messages I received from Leo in mid–1995 on the topic. Unfortunately, we do not appear to have any copies of my side of the conversation. We hope to post other messages from Leo here from time to time as his remarks covered a very broad range of topics pertaining to trees and data mining.

Published in Dan Steinberg