Data Mining Automation with SPM Batteries
This series focuses on what Salford Systems calls “batteries,” which are pre-packaged scenarios that are inspired by how leading analysts structure their modeling work. These batteries, or experiments, create multiple models automatically which allow the user to easily see choices and make decisions. These videos highlight the implications of batteries as well as best model building practices.
This is an introduction into what SPM automation (Batteries) is, how it can be used and why it is important to an analyst. There are many different types of these 'Batteries' available in the Salford Predictive Modeler software suite, and we will introduce a few of them in the introduction. We will also review the various controls and tabs to be aware of within the software, so don't skip this introduction if you are not familiar with SPM Batteries.
Battery ONE OFF
We will review univariate analysis in a conventional approach in order to discover the correlation of each predictor in the dataset with the target variable. In this example will be the Boston Housing Dataset, and we will show how each strongly or weakly each predictor is correlated with the target variable. Following this overview of the problem, we will discuss how you can automate this process using Battery ONE OFF to help you discover linear and nonlinear relationships among the variables available.
Battery PRIORS Part 1
In this video, we discuss prior probabilities in CART (Classification and Regression Trees). In a binary classification problem we are trying to separate records into two classes (i.e. fraud, no fraud; spam, no spam). There are certain decisions that the algorithm must make in order to properly classify each record when building the decision tree. The machinery of prior probabilities allows the analyst to directly influence these decisions and influence critical internal decisions as the algorithms is trying to balance the tree it builds and avoid false positive classification. We review how this works in CART, and how you can improve your classification results using Battery PRIORS.
Battery PRIORS Part 2
We build off of the underlying structure of prior probabilities discussed in the previous video, and explore automation features in order to build an even more powerful and productive classification model in CART. By switching to Battery PRIORS, we can now choose a range of probabilities and automatically generate multiple trees simultaneously. Finally, we will review how to compare the various trees in the Battery Results tab.
Battery PRIORS Part 3
In this final video on Battery PRIORS, we will provide a few extra tips and tricks on how to squeeze out all of the benefits that this automation technology has to offer. We will show you how to dig into the Battery Results tab and extract the overall summary using Hotspot detection.
Battery SHAVING Part 1
In this video we begin with the importance of variable selection, and the features within the SPM software suite that can help you deal with this process. There comes a point when you, the analyst, arrive at a collection of potential predictors that you need to explore. You likely will need to manage and reduce the number of predictors, making your model simpler, without reducing the predictive accuracy of your model. In this video, we show some of SPM's automation features that will help you by making this process easier and more efficient with Battery SHAVING. The first example we use here it called 'shaving from the top.'
Battery SHAVING Part 2
In this second example of Battery SHAVING, we will reduce our list of predictors down to an even more manageable level by exploring how to 'shave variables from the bottom.' Essentially, in this example we will remove the least important variables one at a time, and analyze the model performance automatically. Also, you will learn how to save the most important variables in a keep list for future model building.
Battery SHAVING Part 3
Now that you have a general background on how to use Battery SHAVING to reduce your list of variables, we will fine-tune this list of predictors with Battery SHAVING Error.
Battery TARGET Part 1
Battery TARGET is one of the special automation features available on the Advanced Battery Tab in the SPM software. This feature can be used for testing multivariate relationships among predictors. The Battery will take one variable at a time, using it as the target variable, and build a model using the remaining variables as the predictors. In this first video, we will start with an example of using Battery TARGET with a wine quality dataset. In this scenario, we are building a model to predict the wine's quality based on a variety of chemical components as predictors. As a side note, Battery TARGET provides you with the mechanisms to do missing value imputation, which is the most impressive functionality of this automation technique.
Battery TARGET Part 2
In this next video segment, we will use the output from the model produced in the previous video to do missing value imputation.
Battery TRAIN TEST Part 1
We will discuss the stability of CART trees in a binary classification problem. The algorithm builds a tree on the learn sample of data, and at times the results you have will not hold for the test sample. Therefore, you have the problem of finding the optimal balance and stability that will generalize for both the learn and test sample performance. In this example, we will use an email spam dataset and attempt to classify whether an email is spam or not spam. With the use of TRAIN TEST, CART will automatically scan every tree available and summarize the consistency between the learn and test performance in an easy-to-read table. Each tree will indicate if it agrees or disagrees based on direction and rank.
Battery TRAIN TEST Part 2
We will now introduce the class probability splitting rule, and how it can help identify the rank order match of trees using TRAIN and TEST. When selecting the splitting method we will select class probability for this experiment. Also we will compare consistency results between Gini sequencing and class. To conclude, we offer a few final comments about different experiments you can run when you are interested in analyzing the consistency between train (learn) and test results.
Tags: Videos, Webinars, Tutorials, Salford-Systems