Working With Date Variables
There are a variety of ways to represent dates in data files and there is standard, which can make life difficult if one is trying to use date variables in a predictive model. Two of the more common representations are the Microsoft date format (used in Excel and other Microsoft products) , which is the number of days since December 30, 1899; and the SAS date format, which is the number of days since January 1, 1960. For the sake of establishing consistency, the data access library used by SPM converts all date variables to Microsoft dates. The advantage of doing so is that one does not have to guess how dates are represented in the input dataset and Microsoft products are common; the disadvantage is that you might be confused if you are using non-Microsoft products (like SAS) to manage your data.
For example, if one has a variable in a SAS dataset named BIRTHDATE, which is formatted as a date, SPM will automatically convert it to a Microsoft date when reading it in, even though the actual representation in the input dataset is a SAS date. Thus, if BIRTHDATE='29Mar1974'd on the SAS dataset, it will be read as 27117 (the Microsoft value) by SPM, instead of 5202 (the SAS value). Furthermore, no conversion will be made when the SPM scores or otherwise saves the dataset; so BIRTHDATE will be a Microsoft date in any output datasets SPM might create. One of the consequences of this is that if BIRTHDATE is used in a model, any coefficients or split points will based on its values as a Microsoft date, rather than as a SAS date (important if one translates the model to SAS).
There are several work-arounds for this:
1. In SAS, one can strip the formats of any date variables before saving the the dataset to be read by SPM, for example:
*strip the date format from BIRTHDATE;
2. In SPM, one can redefine any date variables as SAS dates, like so:
In this case, it is important to make the same transformations when scoring new data with any models built using the date variable.
3. After scoring the data, one can redefine the relevant date variables as SAS dates in SAS:
format birthdate date.;
4. One can avoid the direct use of date variables entirely by using relative measures of time instead. We recommend this as any dates used to build predictive models will always be in the past and will therefore never be seen again. For example (in SPM):
rem Age when account opened
It should be noted that the SAS code examples above can easily be adapted to whatever programming language, database manager, or statistical package one cares to use.