| Predicting Customer Churn with TreeNet® |
ChallengeCustomer churn presents a particularly vexing problem for the wireless telecommunications industry, with 20-40% of customers leaving their provider in a given year. As once-explosive subscriber growth rates slow down, retaining existing customers becomes increasingly important to a company's overall profitability. If the customers who are likely to churn can be identified, the company can target them with retention campaigns, giving them an incentive to stay and preventing loss of revenue. ApproachThe Teradata Center for CRM at Duke University set out to discover the best methods for determining which customers are most likely to churn. They posted an open challenge to data analysts and modelers: using customer records from a major wireless provider, predict which subscribers would leave the company in the next two months. Entrants were free to use whatever analysis methods they wished. When the competition ended, the submissions were compared against the actual data over two different time periods. Two accuracy measures were then used to judge the data, for a total of four categories. The competition officals also conducted a "meta-analysis" to see which methods generally produced the most accurate results. ResultsSalford Systems was declared the winner in all four categories. Salford's models were created with their TreeNet® software, an innovative form of boosted decision trees known for building extremely accurate models. Across all the entries, the judges found that decision trees and logistic regression methods were generally the best at predicting churn, though they acknowledged that not all methodologies were adequately represented in the competition. Salford's TreeNet models captured the most churners across the board and discovered which of the 171 possible variables were most important for predicting churn. In the top 10% of customers, TreeNet found 35-45% more churners than the competition average and three times more than would be found in a random sample. For companies with large subscriber bases, this could translate to the identification of thousands more potential churners each month. Targeting these customers with an appropriate retention campaign could save a company millions of dollars each year. The DataThe data were provided by a major wireless telecommunications company using its own customer records for the second half of 2001. Account summary data was provided for 100,000 customers who had been with the company for at least six months. To assist in the modeling process, the churners were oversampled so that one half of the sample consisted of churners (those who left the company by the end of the following 60 days) and the other half were customers remaining with the company at least another 60 days. A broad range of 171 potential predictors were made available, spanning all the types of data a typical service provider would routinely have available. Predictor data included:
Evaluation CriteriaThe "training" or "calibration" data described above were provided to support predictive modeling development. Participants were asked to use their best models to predict the probability of churn for two different groups of customers to be scored: a "current" sample of 51,306 drawn from the latter half of 2001 and a "future" sample of 100,462 customers drawn from the first quarter of 2002. Predicting "future" data is generally considered more difficult because external factors and behavioral patterns may change over time. Of course in real world settings predictive models are always applied to future data and the tournament organizers wanted to reproduce a similar context. ResultsContestants were free to develop a separate model for each measure if they wished to try to optimize their models to either the time period or the evaluation criterion, or both. Salford Systems submitted two models: a straightforward out-of-the-box TreeNet model, and a more complicated model averaging the predictions of several different TreeNet models. The contest results are summarized below along with an explanation of their meaning and significance.
|