From
This e-mail address is being protected from spambots. You need JavaScript enabled to view it.
Tue Jun 6 10:07:15 1995
Date: Tue, 6 Jun 1995 10:07:10&ndah;0700
To: dstein
Hi, Dan
I've been doing revisions on my bagging paper (its been accepted
in Machine Learning). What I've noticed is that on some data
sets the setting of the minimum node size can have an appreciable effect on the
error rate. I suggest that the default for minimum node size be
put at one.
Leo
From
This e-mail address is being protected from spambots. You need JavaScript enabled to view it.
Tue Jun 6 22:56:50 1995
To: dstein
Subject: Re: BAGGER
Hi, Dan
The idea of using the unused portion of the bootstrap as a
test sample--maybe to select a pruned subtree is intriguing.
Bagging is pretty compute intensive, and that idea, if it
works, would save a lot of time. I'll give it a spin soon.
The reason I suggested not emphasizing class probability trees is that I haven't seen any convincing evidence that it produces better estimates of class probabilities that straight mis– classification pruning. A good example would convince me other– wise. If you want accurate estimates of class probabilities, go to gagging!
(I mean bagging!)
Rememebr my run in the bagging paper which shows an very large increase in aaccuracy of class proability estimnates if bagging is used. I've got a revision of the bagging paper coming out in the next few days and will send you a copy when it comes of of the press.
Leo
From
This e-mail address is being protected from spambots. You need JavaScript enabled to view it.
Wed Jun 7 20:04:22 1995
To: dstein
Hi, Dan
I implemented your idea of using the cases left out in the bootstrap sample to use as a test set in trimming the tree and ran it on a bunch of data sets. So here is the story:
It works pretty well, but in almost every case the error using it is slightly higher that using cv or using a fixed proportion of the bootstrap sample as a test set (for large data sets). The idea is really promising but I think that we don't know yet what the rigfht way is to implement it. The advantage is that is reduces computation by an order of magnitude—but I don't think thats enough to offset the (small) lossa in accuracy in todays market of cheap cycles.
Leo
From
This e-mail address is being protected from spambots. You need JavaScript enabled to view it.
Thu Jun 8 10:04:14 1995
To: dstein
Subject: bagging
Hi, Dan
Here is what works like a charm in bagging. Construct the tree using the bootstrap sample and then select the pruned subtree using the original data as a test sample. Not only does it work in practice, but there are also good theoretical reasons why it works.
The good thing about this is that it cuts the amount of computing needed in bagging by a factor of ten.
Leo
From
This e-mail address is being protected from spambots. You need JavaScript enabled to view it.
Mon Jun 12 08:42:23 1995
To: dstein
Subject: Re: bagging and testing
Hi, Dan
Glad the tutorial went well. Nothing like thousands of happy
CART users.
The reason using the original data works well is this: The bagged sample used to grow the tree is drawn from the underlying bootstrap probability distribution that assigns probability 1/(sample size) to each case in the data set.
To prune the tree grtown on the bootstrap sample, you wasnt a test set drawn from the same underlying distribution—i.e. another bootstrap sample. But the test set should be as large as possible. So consider drawing an almost infinite bootstrap sample from the bootstrap probability distribution. In this latter case every case in the original sample appears a number of times nb(case id.) in the bootsample. All of the nb are large and a little argument based on the binomial (or using intuition) shows that the ratio of the nb for different cases goes to one. Ergo, using an “infinite” bootstrap sample is equivalent to using the original data set.
Pretty neat, huh? I thought of it at night last week sitting in our hot tub.
Leo


