Once you have built an SPM model (CART, MARS, TreeNet, RandomForests) and have saved the grove (.GRV) file you are in a position to make predictions for any other data set containing relevant predictors. Thus, if you trained your model on file A using variables X1, X2,...,X50, for example, you can now predictions for file B, provided that file B contains at least some of the same variables (and preferably all of the variables actually used in the model).
This process of prediction generation is called SCORING in our software and most models are built specifically so that they can be put into production to generate predictions. The process can also be used for SIMULATION. In this case you prepare a data set which will also contain the columns X1, X2, ...,X50 but the values appearing may not necessarily be real data. Instead the file could contain hypothesized or imagined values, or forecasted values, as in the case when you want to make predictions for certain possible future scenarios.
If you open a saved grove for any any Salford Systems data mining engine (CART, MARS, TreeNet, RandomForests) you will notice a “Commands” button among a row of controls along the bottom of the display. The Commands button will open a plain text window displaying all the commands entered in your session up until the run that generated the grove.
Salford Predictive Modeler™ and its component data mining engines CART®, MARS®, TreeNet®, and RandomForests® contain a variety of tools to help modelers work quickly and efficiently. One of the most effective tools for rapid model development is found in the BATTERY tab of the MODEL Set Up dialog. Because there are so many tools embedded in that dialog we are going to start a series of posts going through the principal BATTERY choices, one at a time.
Let’s start with the idea of the BATTERY. The BATTERY mechanism is an automated system for running experiments and trying out different modeling ideas. Instead of you having to think about how you would like to tweak your model to try to make it better the BATTERY does it for you. Each BATTERY is a planned experiment in which we take some useful modeling control and run a series of models in which we systematically change that control. The best part of this is the SUMMARY which provides you with an executive summary of the results and points you to the best performing model. We recommend that you use the BATTERY often; some modelers don’t do anything without setting up pre–packaged or user customized batteries.
Most users of Salford Systems’ data mining tools (CART®, MARS®, TreeNet®, RandomForests® or the more recent integrated SPM™ package) rely on the GUI (Graphical User Interface) to do their work. The GUI makes life easy as you do not need to remember any command syntax and of course the GUI has many useful visual displays of important results. But there are some good reasons to learn how to work with command scripts which is the topic for the current posting. We will refer to our software as SPM (Salford Predictive Modeler) which includes all of our individual data mining engines.
It is useful to remember that almost everything you do during a GUI session using SPM has a “command equivalent.” That means that you could accomplish the identical model and results simply by submitting a set of commands to SPM instead of pointing and clicking. Even more useful to remember is that SPM automatically creates the equivalent set of commands for you as you work, saving the results to a text file. We will return to how to locate that text file a bit later.
SAN DIEGO—A new, free download method of Salford Systems’ data mining software has been designed and implemented, making it easier than ever for data miners to download Salford’s ultra–fast tools with just a few clicks of the mouse.
The new process works like this:
Step 1: Chose the product(s) you are interested in evaluating.
Step 2: Provide your name and contact information.
Step 3: Download!
It’s as easy as that, and Salford Systems couldn’t be happier to finally launch this new method!
Salford Systems Predictive Modeler, including CART®, MARS®, TreeNet®, and RandomForests®, can handle any number of variables you care to work with. By default your software will launch prepared to work with up to 32,768 variables which is sufficient for many users. However, if you need to work with a larger number you just need to let the software know at the time the application is launched.
If you are working with non–GUI version you make use of command line arguments informing the application of your preferences. For example the command line syntax is:
SPM.EXE -v< N > Specifies max N variables for the session.
With the GUI version you essentially do the same adding the command line arguments by modifying the properties of the application.
Just follow the following steps, for example, to inform SPM you expect to work with up to 50,000 variables:
The value used for this parameter reflects the number of variables allowed to be used in the application. For example, if you need to use 75,000 variables, then you would need to set this parameter at –V75000.
The SPM™ software must be downloaded with Administrator rights and read/write & modify permissions MUST be applied to the /bin directory PRIOR to proceeding. If you need help with SPM Installation (Administrator Rights & Ensuring Proper Permissions), please contact This e-mail address is being protected from spambots. You need JavaScript enabled to view it. .
Once the above instructions have been completed, you can now request your Unlock Key.
To unlock the SPM™ software for your 30–day free evaluation, please e–mail the following information to
This e-mail address is being protected from spambots. You need JavaScript enabled to view it.
.
The most recent versions of Salford Predictive Modeler™ SPM PRO EX include a new BATTERY to invoke bootstrapped replication of most model types available in SPM. One of our reasons for adding this BATTERY was to provide access to the full CART engine when generating RandomForests® (RF) models. The principle advantages of this are:
Breiman’s original RF uses a stripped down and simplified tree growing algorithm designed for speed. It lacks tree growing options and missing handling, and fort many users Breiman's RF is confined to classification problems. By accessing the full CART engine with all of its Salford extensions and customized controls, modelers can accomplish far more sophisticated analyses, handle missing values with surrogates, apply penalties and constraints, and most importantly for those interested in continuous dependent variables, BATTERY BOOTSTRAP gives access to both Least Squares (LS) and Least Absolute Deviation (LAD) regression trees.
The principle drawback of BATTERY BOOTSTRAP is that the extra machinery comes with a computational price: RF runs under BATTERY BOOTSTRAP are much slower than under Breiman–RF. The extra robustness, ability to handle huge problems, and added controls should often make the slower runs worthwhile. Also observe that at the moment the RF post–model visualization machinery is not available.