Finding The Automatically Stored Command Log
Most users of Salford Systems' data mining tools (CART®, MARS®, TreeNet®, RandomForests® or the more recent integrated SPM™ package) rely on the GUI (Graphical User Interface) to do their work. The GUI makes life easy as you do not need to remember any command syntax and of course the GUI has many useful visual displays of important results. But there are some good reasons to learn how to work with command scripts which is the topic for the current posting. We will refer to our software as SPM (Salford Predictive Modeler) which includes all of our individual data mining engines.
It is useful to remember that almost everything you do during a GUI session using SPM has a "command equivalent." That means that you could accomplish the identical model and results simply by submitting a set of commands to SPM instead of pointing and clicking. Even more useful to remember is that SPM automatically creates the equivalent set of commands for you as you work, saving the results to a text file. We will return to how to locate that text file a bit later.
Here is an example. The SPM installation comes with a sample file named GOODBAD.CSV. To open that file from the command line we would do this:
Use the key combination CTRL+N (control key together with the N key) to bring up an SPM notepad window. This window is a mini text–editor intended for the construction of command scripts and you can bring up as many SPM notepad windows as you like (every time you press CTRL+N you get another one).
Type the following command in the new window:
You might need to refer to this file by its fully qualified pathname. i.e. you might need to type something like:
USE "E:\Applications\Data Miner\Docs\Examples\GOODBAD.CSV"
To see exactly how to refer to the file look to the end of this post for details.
End the USE command with a ENTER. Then from the File menu select "Submit Window." This will open the file for you and bring up the activity window.
The benefits from using the command line or scripting language become a little more evident if you include several commands such as:
KEEP AGE, CREDIT_LIMIT, EDUCATION$, GENDER, HH_SIZE, INCOME, MARITAL$,
NUMCARDS, OCCUP_BLANK, OWNRENT$, TIME_EMPLOYED
METHOD GINI POWER = 0.0000
Here we have used commands to set up and run the entire model. The benefits include:
Confidence that you are running exactly the model you intended to run. If you are working with a larger data set and have several hundred variables, you might not want to have to point and click on each of the hundreds of variables. Of course you can highlight blocks of variables and this can go very smoothly in the GUI. But after doing this once it can be very convenient just to have the memorized list in a command line to reproduce your exact selection. Besides predictors there are actually quite a few choices among optional model building controls. Even though you can get very far using the defaults you just might want to change a control from its default value. Having this recorded in a command file ensures that you won't forget about your modification of the control when you want to either reproduce your results or run something very similar on another data set.
Time saving. Even a long command file usually executes in less than second. From command files you can construct and manage large numbers of models.
The command language is richer and more powerful than the GUI interface. There are quite a few commands that can only be accessed from scripting.
How should you use the command line?
Understand that you never need to choose. You can enter one command from an SPM notepad and then enter the next from the GUI and then return to the Notepad to enter another command via scripting. In other words, you can mix and match freely as each interface is fully aware of the other.
We happen to do all of our consulting and production work via the command line, but we start by experimenting and trying to understand the data and the problem by using the GUI.
Finding the automatically Stored Command Log
In the GUI, go the Edit/Options dialog. The toolbar offers an icon that looks like a check mark inside a check box to the right of the Paste icon which you can use as a short cut to this dialog. Select the Directories tab and go down to the last item on this menu "Temporary Directories." When you install SPM (or CART, MARS, Treenet, RandomForests) the software automatically sets this to a directory that Windows prefers and it uses the very hard to read short directory name conventions. This location is where all SPM temporary files are written including copies of the command logs for all of your SPM sessions.
Our recommendation is that you change this default temp file location; create a new directory dedicated to just SPM on a drive that has abundant free space. Whenever you run SPM, especially if you run some complex multi-stage models and make use of some of SPM's newest features, you could be creating several files much larger than your original training data. The last thing you want to happen is to have a model construction process because there was not enough disk space available for the temporary files.
But it is also important to be able to quickly find your command logs and you will be able to do so if you have created a dedicated directory for them. These are always plain text files with a ".txt" extension, and they begin with letters like "CART," "CTRX," "TN" etc., followed by the date the session was started and some random characters to keep the names unique. These files can prove very valuable information in the future if you ever need to locate an audit trail of your work or try to reproduce an interesting result.
Using the Command Log
The Command Log is always saved automatically for you as we described above. But it is also available inside of your windows session. Just use the View menu item and select command log. For day to day purposes you can ignore most of the commands listed and just focus on the key commands that set up your run. These would be commands like:
You can always copy these commands into a new SPM Notepad and then edit them to reflect changes that you want to try, including of course the possibility of setting up a large number of runs for execution while you are away from your desk or busy working on other matters.
Learning more about commands
The PDF manuals contain detailed descriptions of every command available within SPM. Also, we have a brief text based help system you can access from the command line. From a Notepad submit the command:
Or HELP keyword
Where keyword is a specific command you want to know about such as:
The results show up in the Classic Output window and of course you can save the information to a plain text file.
A great way to learn the basics is to run a session using the GUI and then study the resulting command log file. Just remember that the log contains a number of automatically generated commands having to do with setting up defaults and you can start by just ignoring them (erase them). Focus on the parts specific to your run and you will quickly get the hang of simple commands.