Dan Steinberg's Blog
On Demand Introductory Videos
Download Now Instant Evaluation
Get Price Quote

Using LAGS for Time Series-Style Modeling in SPM

If you wish to run a time series or panel data (time series cross section) style model you will frequently want to use lagged values of variables as predictors. To include lags of variables as predictors from the command line just enter a KEEP command such as: 


Which will automatically create first (X lagged once) and second (X lagged twice) lags in the predictor X.  To enter a whole series of lags use the syntax:


This will generate the first lag and five additional lags (in other words, lags 1 through 6)

If you wish to enter selected ranges you might have to create a command like:


In panel data we have time series for several different units of observations and lags are meaningful when created specifically within a unit of observation.  For example, if we have daily sales of a specific product for every store in a chain we would normally order the data chronologically, with data for the first store at the top of the file, data for second store below that, and so forth, ending with the chronologically order for the last store at the end of file.  When creating LAGS we need to ensure that when looking for a specific lag for one store that we do cross over into data for another store to create that lag.  If we cannot find the lagged data within the span of data for the store in question then the lagged value must be set to missing, and SPM can look after this for you automatically.

We ensure proper handling of lagged variables by issuing the BLOCK command. Thus:


Will restart the process of constructing lags at the beginning whenever processing a new store.

As an example we look at some data derived from the daily sales of a chain of grocery stores.  Our data includes TOTAL_UNITS_SOLD of a specific product (SKU), the ID of the specific store, a label for the type of promotion, and the time elapsed in days since the last promotion ended.  Obviously, any real world data set would have many more predictors available; we have constructed a stripped down example to illustrate the main concepts related to the use of lags.

In the GUI the model setup would look as follows:

time series modeling

Our goal here is to model TOTAL_UNITS_SOLD, which is a continuous variable and we are therefore going to develop a regression model.  However, lags can be constructed for any kind of analysis supported by SPM.

Next we need to refer to the “Lags” tab on the model setup, which will reveal the display we show next. Observe that all the available variables are listed including the target variable, as any of these variables can have lagged values used as predictors.

The first thing we do is identify STORE as our BLOCK variable.  For this to work correctly the data must be sorted properly with all records belonging to the same store in one contiguous block. Also, within a BLOCK all records must be sorted correctly, in this example by date. (The date variables are not shown in the example but they were there in the original file!)   

lags and time series modeling

We now can indicate which variables we would like to include as a predictor in lagged form.  The display starts off with three columns permitting you to indicate 3 lags but you can always click on the “Add Column” button to the bottom right.

Here is what the command generator would add to the command log starting from the above model setup:





Normally, when running time series models the test partition is defined either in terms of time (learn on older data, test on recent past) or in the case of panel data, in terms of which case histories are assigned to the test partition. (The two approaches can also be combined).  Here we reserve all stores with ID numbers greater than some threshold for testing and obtain the following results:

time series analysis results in SPM

The purpose of this exercise is not to try to build a high quality model but just to illustrate the workings of the LAG machinery.  Most likely you will end up working with several variables being lagged and you should certainly consider working with engines such as TreeNet, CART, MARS, and GPS.


Tags: Blog, Time Series, SPM