Download Now Instant Evaluation
Get Price Quote

Repeated Cross Validation In The SPM® Software Suite

Learn to address the challenge of testing small training data sets and improve the reliability of results using Battery Cross-Validation (CVR).

Video Tutorial

Repeated Cross Validation In The SPM® Software Suite

Learn to address the challenge of testing small training data sets and improve the reliability of results using Battery Cross-Validation (CVR).

Greetings and welcome to another in the series of Salford Systems online training videos, this is Dan Steinberg. Please review the Getting Started slides if you don't already have your software downloaded and sample data sets available. In the current series of videos that we're presenting here, we're going through the battery mechanism that is available inside of the Salford Predictive Modeler. The battery feature in Salford data mining tools is a powerful mechanism for automatically running modeling experiments. Each battery automatically explores the consequences of changing something important in your modeling strategy. By automating this exploration and experimentation, investigations are systematic and complete. Your investigations are consistent and follow the same pattern if you want them to. Large numbers of models are queued up automatically. Summaries visually display key results and this whole mechanism allows you to accelerate the process of arriving at a well-chosen model. In the version you have downloaded, you should have all the batteries that were going to discuss available. If you're working with an already licensed version, whether you have the battery feature enabled and how many batteries are available, will depend on your version. This video series will work through all of the battery options available in the Pro EX release of our tools. SPM Pro EX and specific tools, such as CART and TreeNet Pro EX are available for evaluation and for use while we work through these videos. If you don't see a battery that we're discussing in one of these videos, then please contact Salford to get access today.

Battery CVR

Today we're going to be speaking about one battery in particular that is called CVR, for Repeated Cross–Validation. We're going to assume that you already know a little bit about cross–validation, or you can refer to the video that we have on the topic. Cross–validation is a well established method for testing the predictive reliability of models. Although originally developed to address the challenge of testing small training data sets, cross–validation is widely used even with larger data sets. Although any one cross–validation model may be tested on a relatively small subset of the data, the fact that we rotate through several test partitions and retest on different partitions increases the reliability of results. Cross–validation is thus well–regarded machine learning and widely relied upon to measure a model's performance. Although all smaller training samples are tested by cross–validation. Almost all smaller training samples are tested by cross–validation. You will encounter cross–validation results being reported widely in a great many articles in the scientific literature. The thing to remember is that cross–validation is a random experiment. When we test the model by cross–validation, we run an experiment one time. The experiment is somewhat complex and has multiple parts. Ten fold cross–validation has 10 parts. Ten different CV models and one main model are built. The specific results that we get from this experiment are the results of a random assignment. We randomly divide the data into the K partitions, making up the K folds of the cross–validation, and then we assemble, eventually, our cross–validation results. A different random division of the data into a different set of K folds are going to yield different results, usually and hopefully, the results will only be slightly different, but nevertheless they will be different.

Why is CVR so reliable?

Cross–validation is sometimes thought to be far more reliable than the users who make use of this technology understand. We'll call this a cross–validation illusion. Modelers sometimes have the illusion, that by running CV once, they have allowed for randomness and averaged out its effects. It is important to understand that a single CV run is just a single random experiment, like tossing a coin once to decide whether it's fair. To truly understand the variability inherent in the experiment, we need to repeat it many times. The best estimate of the performance of the main model is thus, an average across multiple CV runs. In the past this was rarely practiced because of the computational cost. Today, our machines are so much more powerful that it is far more reasonable to recommend that the cross–validation experiment be run not just one time, but multiple times and you can always run your own random experiments by changing the random number seed every time you requested cross-validation. On the EDIT/Options Dialog, you can alter how the random number seed for each data mining engines is managed and you can also allow the random number seed to change between one run and the next. The default is to reset the random number seed to exactly the same value at the start of every run, which allows you to reproduce runs when you run the same set up more than once. We therefore recommend that you leave the defaults in place. The battery CVR makes it much easier for you to run the kind of experiment that we're talking about and then also to summarize the results, and so on the session we're going to work with this battery.

An example CVR run

So here, our screen is showing the Salford Predictive Modeler version 6.8, up and running, and we have accessed the file GOODBAD.CSV, which is available from our website and also has been included in the installation package that you may have downloaded and installed in your machine. You can see here, we have 684 records a relatively small data set. We've only got 14 variables and we're going to go ahead and set up our modeling with this. Recall that the dependent variable is called TARGET, and it happens to be an indicator as to whether the person in this file had a good outcome or bad outcome, with respect to a loan that was made available to the person. Here we'll select all the other variables as predictors, we don't have to do that, but the reason that I'm doing it here is just to make sure that if we wanted to make any selections, we would be able to do that. And the only change I'm going to make to the automatic settings is in this variable POSTBIN, which is coded as a number, is actually treated as categorical.
This is actually a grouped variable of a postal code, so it represents the region that an individual lives in. We're working with the CART engine. On testing, we are working with tenfold cross–validation so everything is set to the defaults. We would just click the START button and away we go. Because this is such a small data set it should complete even on a relatively modestly powerful computer in a few seconds.
So what do we see here? I only want to focus here on the test results that we get and the cross–validation process is used to assemble an estimate of the ROC area under the RSC curve that we would get if we ran this data on a larger data set with real test data. So our estimate of what we should expect to get on future data is a ROC of .81, which if you're familiar with this measure, is an excellent measure. If you're not, another way to look at it is through the misclassification matrix, or classification matrix. And over here we see that we're getting an average of about 78% correct and also the overall correct is almost identical at 78% on the test data set so a number that is not that different from the ROC score in this particular example. All very well and good, the question is how reliable is this .81? And as we said, a lot of people believe that this .81 is highly reliable just because a multiple testing mechanism was used in order to get the results.
Let's go ahead and set this up for battery CVR. So I've clicked on the battery tab. I now click on CVR. You can see here the description, repeat cross–validation with different random number seeds.
Add it and over here, what I want to do is edit this particular field, so that instead of being run 200 times, which is a little too much for our experiment, let's run this 30 times.
Let's go ahead and hit the start button. I'm not going to let the recorder run while this runs. It will take a couple of minutes and I'll restart this video as soon as the results have come up. Okay, so it took a couple of minutes for this to run and this is what the battery summary looks like. It shows us what the relative error is for each of the 30 runs. The 30 runs don't all display in this area here, so what we can do is scroll over to see the results going all the way from the first run, CVR_one, to the last run, CVR_30. If you want, you could instead stretch this, so you get enough space to see everything.
Instead of looking at the relative error, let's click on ROC here and see what happens. The graph makes it appear that things are bouncing around quite a bit, but if we look at the summary results here, we can see that the lowest value that we got is .7913. The highest value that we got is .8373, and the average, .8191, is almost exactly equal to what we got when we ran a single run.

CVR results

So what happened when we ran that single run? Well, in that particular case we were lucky in that what we got was pretty much, dead on, the average that we would've obtained if we'd run it 30 times. However, we didn't know that before we ran this experiment. We could've ended up with a .79, which would've been a little pessimistic. We could've ended up with a .8373, which would've been a little bit optimistic, and so by looking at the results here, we can get a better idea of whether the results first of all, of our first run, were representative, and second of all, about what the level of uncertainty about those results are. If you go to the Battery Summary and look at the other tasks, you can see that we have an option to look at the error profiles.
These are the profiles that show the performance of that tree, as it is allowed to grow within one of the cross-validation experiments. What you can see here is that when the trees are small, all of the runs give pretty much exactly the same results all the way through to about 4 or 5 nodes. But then as the trees get larger, the cross-validation process estimates vary quite a bit more, and so there is a certain amount of uncertainty here as to what the results are but not a great deal. On this particular display, if you want, you can request that we only show the average and so that's what the particular curve over here is. The purpose of this curve is, in part, to allow you to decide whether you want to be going to a larger or smaller tree. In this particular case, that maximum performance, when it comes to ROC is pretty well defined and there is pretty much unanimity among the models as to what is the right size tree.
Let's click 'All' again and you can see that the maximums are occurring pretty much together here. If you prefer to see the 'Misclass' curves, these are the more traditional curves that you see underneath any one of the CART navigators. All 30 examples are shown there, and again, if you want, you can request the average curve be displayed. You can also ask that the best performance, which in this case is the least error curve, is shown and also the worst performer. Again, giving you an idea of approximately how much uncertainty there is, and this is, of course, looking from one extreme of the outcomes to the other and where the average sits.

CVR in command line

There is one other interesting thing you can do with Battery CVR, and using the command line you can give this command, which is battery CVR=30. That's the number that we decided to run here.
And then also at, on the command line, the save command so the save command is going to produce an output data set. In this case, I simply modified the name of the original file a little bit to include CVR, and what I would want to do is run this command, which will give me exactly the same results but it will also give me a save data set. Let's go to look at that save data set and see what's inside of it. So we are now looking for this data set. Same number of records, 684, but now we've got 211 variables. What exactly is going on here? Well first of all, for every record we indicate the terminal nodes that that record was assigned to, through the different experiments. There are 30 of these experiments, so every record has an opportunity of ending up in a slightly different terminal node depending on how that main tree was pruned due to the cross–validation results. But here's the part that is interesting and this is the OOB prediction, and they're going to be 30 of these as well. The OOB prediction is the prediction that is made on that particular record, when it was in the test partition for that replication of the CV (cross–validation) modeling process. So, this is a prediction that is made for that record when it is not part of the training process. And this could be very interesting for a number of possible studies of the actual inherent variability in the data set. So besides the prediction, which is a yes/no outcome, we also have the prediction of the probability, which comes from the terminal node also OOB. The final variables are the predictions that are made by the main tree, and those of course, are not going to vary very much from run to run. They'll vary only due to the pruning of the main tree.


So, in order to summarize where we are now, what we want to say is that the battery CVR runs this experiment as many times as you care to run it, and then not only summarizes the result, but also gives you some interesting details of the process which can be used for your own subsequent sensitivity analyses. Now exactly how those sensitivity analyses are to be conducted are going to be the subject of another video. So again, thank you for spending your time with us and investigating more technical details of the SPM product. We look forward to seeing you again soon.
(Video Transcript)


Tags: Videos, Webinars, Tutorials, Salford-Systems

Get In Touch With Us

Contact Us

9685 Via Excelencia, Suite 208, San Diego, CA 92126
Ph: 619-543-8880
Fax: 619-543-8888
info (at) salford-systems (dot) com