Download Now! Free 30 Day Trial of Salford System's Predictive Modeling Suite

Upcoming Tradeshows

View full calendar
Home Resources Walkabouts
MARS Walkabout
Step into the next generation of data mining and predictive modeling...

If you are already familiar with CART®, you should feel right at home using MARS. Like CART, MARS™ can be run using the intuitive Windows interface or by using commands submitted at a command prompt or in batch files.

In this example, the objective is to build a regression model that predicts housing values using various quality of life attributes such as air pollution, crime, taxes, school quality, and distance to employment centers and highways.

Let's get started by opening the data.

MARS provides easy data access, directly importing over 80 different file formats. (Similarly, MARS predictions can be written out to over 80 formats.) Just make sure that "Use DBMS/COPY" option is activated in the "File" menu to have an access to this important feature.




To open the Boston housing data file, select "Open"->"Data File..." from the "File" Menu, specify the type of file (e.g., Systat, Excel, SAS, ASCII, etc.) and then select the file itself. After opening the file, the Model Setup dialog automatically opens.

 

The Model Setup dialog is the primary control center for building your MARS models, with the most commonly-used model, set-up and refinement options conveniently located in the five tabbed dialogs.




A MARS Model can be constructed with as little as two actions: selecting the target variable (MV, median housing values) and selecting the set of predictors (all remaining variables in this example).

That's it! Click on "Best Model" to estimate the MARS model.

Using intelligent default settings and a fast intensive search procedure, MARS selects which variables to use and automatically transforms and combines them into the best-fitting model.

When MARS has found the Best Model it will show you a screen like this:




MARS produces several summaries and reports and this display lets you choose which to look at. This panel tells us that the final model uses eight of the 13 available predictors, which MARS converts into 11 predictor terms. The model performance is better than you can get with a linear regression with a naive R-squared of 0.862. The GCV R-squared is the MARS estimate of how well this model would perform on new data. The GCV measures are always more conservative and more trustworthy than the naive measures.

Let's look at the "Variable Importance" tab first




The most important variable always receives a score of 100 and variables with a score of 0 are not used at all in the MARS model. Here RM (number of rooms) and DIS (distance from employment centers) are also quite important. An important variable is one which contributes substantially to the predictive accuracy of the model.

The essence of the MARS model is best seen in the graphs displayed when you click on "Curves and Surfaces." These graphs show how the relevant predictors affect the target and reveal where the relationship changes.

In the plot on the left, for example, we see that as the % of lower socio-economic status people in a neighborhood (LSTAT) increases the average value of homes decreases rapidly. However, once LSTAT reaches about 6%, further increases in LSTAT depress home values much more moderately.

Similarly, MARS has discovered that RM (# of rooms) has a substantial positive effect on the value of homes providing that we are looking at neighborhoods with homes larger than 6.5 rooms on average. For neighborhoods with very small homes (below 6.5 rooms) making houses a little larger adds nothing to their market values.

 

Our first example developed the simplest type of MARS model -- one without any interactions. MARS will also detect significant two-way, three-way, and higher-order interactions between predictor variables, upon request.

Let's see if allowing two-way interactions improves the model. To change the number of interactions searched, we revisit the Model Setup dialog. In the Options tab, we increase the Maximum interactions to 2 and click "Best Model."

 

The Model Summary now reports that the best model contains 10 terms instead of 11 and the R2 has increased by about three percent. So this model looks like it might be an improvement. Before drawing any conclusions, however, we should also look at the curves and surfaces for this new model.

 

MARS detected four significant two-way interactions, as shown below. You can rotate any of these graphs by clicking on the curved arrows at the bottom right of the window, and by clicking on the controls at the lower left, show them as contour plots and make other changes (mesh, zones).




You can export any graph by highlighting a plot and selecting "Export Graph" from the "File" Menu. Even easier, just right click the mouse and then select "Export..." from the pop-up menu. You can export in any of the best known graphic formats (.BMP, .JPG, .WMF, and .PNG) for easy import into documents and presentations or publication on the Web.

Once you have a model you like you will want to deploy it. One way to use a MARS model is to score a database from inside of MARS. First, you have to Save the model from the File menu. Then, choose the database to score, specify where the results should be stored, and go. The output can be saved in over 80 different formats; in this case we select Excel 97. The window below shows you how to apply a MARS model to any database.

 

The MARS model can also be deployed using other software packages, including programming languages and database managers. We produce the needed source code in the Basis Functions tab; just save it to a text file by right clicking in the window. The text can be edited, manipulated via Perl scripts or otherwise modified to suit your needs. We also offer an add-on tool to export an XML rendition of the model. If you are going to deploy a MARS model using the basis functions directly, be sure to set Edit..Options..Decimal Places to a value such as 7 so you retain sufficient precision to capture the model faithfully.

 

So far we have asked MARS to find the best model using its own built-in methods of model selection. There will be times when you want to look at the "also ran" models. Perhaps a much smaller model yields good enough performance; sometimes a slightly different model matches expectations better.

To have ready access to the entire sequence of models developed by MARS click on "All Models" in the Model Setup dialog. You should see results something like this:




The table contains one line per model and reports the number of basis functions, the number of variables, and a penalized R-squared (GCV R-Squared). The largest, most overfit model appears at the top; as you move down the table you see progressively smaller models obtained by deleting the least useful basis function(s) from the line above. Usually only one basis function at a time is dropped in this backwards deletion sequence.

The model MARS identifies as best is marked by the double asterisks; all MARS reports and summaries start off by describing this model. To look at another model, just highlight it in the table and then click "Select." All results will now reflect the selected model.

You can repeat the selection as often as you want. Also, the entire model selection window can be saved for later review. To save your results, choose "Save Selector..." in the "File"->"Save" menu. To retrieve these results later, choose "Selector..." in the "File"->"Open" menu during subsequent MARS sessions.

MARS will give you several additional reports if you specify that your target variable is binary. For example, let's model the CHAS variable, which is coded 0 or 1. Besides selecting the target variable and the predictors we also check the "Binary" box below.




We have also set the "Threshold" to 0.40. This is used by MARS to decide which category (0 or 1) each prediction should fall into, as most predictions will fall between 0 and 1. The "Table" check box is used to obtain the prediction success matrix for every threshold setting from 0 to 1 in steps of 0.01 to provide ROC information.

Running this model clicking "Best Model" gives the results we will look at next. The binary target get a "Prediction Success" tab on the model summary.




Of the 35 Census tracts located along the Charles River (CHAS=1), we get eight classified correctly and 27 classified incorrectly. We do much better for the other Census tracts, getting 98.51% correct. You can experiment with the threshold setting; if you lower it you will increase the number of tracts classified as 1. This will increase your performance in the bottom row but decrease it in the top row and you will have to decide on the best setting for your purposes.

Once you have completed several MARS runs you can obtain helpful summaries and comparison charts and tables.

You can obtain several different models either by running MARS several times with different control settings or by selecting several models from the Selector after requesting an "All Models" run.

For this example let's assume that you have two selector windows: one for a run without interactions and another for the run with two-way interactions allowed. In this example, each selected model just happens to have five basis functions. From the Report menu select "Set Report Options" to reach the control panel shown below.

In this window the right hand pane allows you to select the parts of the MARS output to send to the report. The left hand pane lets you check off which Selectors to report on. In this case we decided to include only basis functions.




"Report Now" gets you to the next interesting screen.

After you click "Report Now" an Organizer window like the one here will appear:




This text window can be saved in a Microsoft Word-compatible Rich Text Format (.rtf) document and can be edited from within MARS. During any session you can add other reports to this window by visiting a Selector window and then choosing "Report Current" in the "Report" menu.

You can also add almost any graph, chart, or the contents of any MARS window by right clicking your mouse in the window and then selecting "Add to Report."