MARS® (Multivariate Adaptive Regression Splines), introduced by Stanford University data mining guru Professor Jerome H. Friedman in 1988, is one of the landmarks in the evolution of regression methods. For the first time analysts could leverage a search mechanism intended to automatically discover nonlinearity and interactions in the context of classical regression.
Frequently Asked Questions for MARS
MARS® is ideal for users who prefer results in a form similar to traditional regression while capturing essential non–linearities and interactions. The MARS approach to regression modeling effectively uncovers important data patterns and relationships that are difficult, if not impossible, for other regression methods to reveal.
Conventional regression models typically fit straight lines to data. MARS approaches model construction more flexibly, allowing for bends, thresholds, and other departures from straight-line methods. MARS builds its model by piecing together a series of straight lines with each allowed its own slope. This permits MARS to trace out any pattern detected in the data.
The MARS model is designed to predict continuous numeric outcomes such as the average monthly bill of a mobile phone customer or the amount that a shopper is expected to spend in a web site visit. MARS is also capable of producing high quality probability models for a yes/no outcome. MARS performs variable selection, variable transformation, interaction detection, and self–testing, all automatically and at high speed.
Multivariate Adaptive Regression Splines was developed in the early 1990s by world-renowned Stanford physicist and statistician Jerome Friedman. It is an innovative, flexible modeling tool that automates the building of accurate predictive models for continuous and binary dependent variables.
The major advantage of MARS is that it automates aspects of regression modeling that are difficult and time-consuming. These include:
MARS is not a black box. It is faster, more interpretable, and more accurate than neural nets.
Almost all modeling technologies can track training data accurately. MARS protects users from misleading results through its two-stage modeling process. MARS overfits its model initially but then prunes away all components that would not hold up with new data. MARS provides assessments through use of one of two built-in testing regimens: cross validation or reference to independent test data. Using these tests, MARS determines the degree of accuracy that can be expected from the best predictive model.
MARS is capable of predicting with much higher resolution and accuracy, typically producing unique scores for every record in a database. In this way, MARS expands on the capabilities of decision trees for regression.
A MARS predictive model can be implemented in two ways. First, new databases can be scored directly by identifying the MARS model and the data to be scored. MARS will perform all the required data transformations and calculations automatically and output the predicted scores. Second, the MARS predictive equation can be exported as ready-to-run C and SAS®-compatible code that can be deployed in the user's application framework.
MARS automatically creates a missing value indicator – a dummy variable – that becomes one of the available predictors. These dummy variables represent the absence or the presence of data for the predictor variables in focus.