• Banner 201707

    INTRODUCING

    Fast, highly accurate platform for data mining and predictive analytics

  • Banner 201707

    INTRODUCING

    Fast, highly accurate platform for data mining and predictive analytics

Download Now Instant Evaluation
Get Price Quote

How to I define penalties to make it harder for a predictor to become the primary splitter in the node?

CART supports three "improvement penalties." The "natural" improvement for a splitter is always computed according to the CART methodology. A penalty may be imposed, however, that causes the improvement to be lessened depending, affecting the penalized splitter´s relative ranking among competitor splits. If the penalty is enough to cause the top competitor to be replaced by a competitor, the tree is changed.

Improvement Penalties

Variable-Specific Penalty
This penalizes a given predictor (perhaps because it is expensive to collect and we do not want it serving as a splitter unless it is a really powerful predictor). If the user-defined variable-specific penalty is in the range [0,1] inclusive, then the natural improvement is adjusted as:
improv-adj = improve * (1 - variable_specific_penalty)
If the user-specified penalty falls outside of [0,1] then no penalty is imposed.
Missing-value Penalty
This penalizes the improvement of a competitor based on the proportion of missing values for the competitor in the node in question. This makes it difficult, but not impossible, for a competitor with many missing values in a node to rise to the top of the competitor list and assume the role of primary splitter. If there are missing values, the improvement is adjusted as:
improve-adj = improve * SW1 * [ (Ngood/N} ^ SW2 ]
in which SW1 and SW2 are controlled in the PENALTY command, N is the size of the node, and Ngood is the number of records with nonmissing values for the variable in question. If there are no missing values (NGOOD=N), no adjustment is made.
High level Categorical Penalty
This penalizes a categorical variable that has many levels relative to the size (unweighted N) of the node in question. For a categorical variable:
ratio = log_base_2(N) / (Nlevels - 1)
in which NLevels is the number of levels for the categorical predictor and N is the number of learn sample records in the node.
improve-adj = improve * [ 1 - SW3 + SW3 * (ratio ^ SW4) ]
in which SW3 and SW4 are controlled on the PENALTY command.
Note that all three penalties can be in effect, in which case they all serve to decrease the "freely computed" improvement, resulting in a "adjusted" improvement, which is what appears in the competitor table and is used to rank the competitors.
These penalties are first used in adjusting the improvements evaluated for the competitors in a node. When generating surrogates, the penalties will affect the improvements computed for the surrogates in the same way — unless PENALTY SURROGATES=NO is specified, in which case improvements are not adjusted for surrogates even if missing values or high level categoricals are involved.
Note that the associations for surrogates are not penalized, so these penalties will not change the ordering of surrogates for a given primary splitter. They will only affect the improvement listed for a surrogate.

[J#373:1602]

Tags: Frequently Asked Questions, FAQs, CART, Support, Salford-Systems

  • SPM Version 8 Just Released!

    SPM Version 8 Just Released!

    NEW Salford Predictive Modeler software suite.

    Read more

  • Environmental Forecasting

    Environmental Forecasting

    Forecast the evolution of environmental outcomes using changes in habitat and climate as predictors.
  • Sports Analytics

    Sports Analytics

    "Discover the undisclosed predictors to successful athletic performance using modern decision trees."
  • Targeted Marketing

    Targeted Marketing

    Enabling you to get appropriate prospective customers more efficiently than any other marketing strategies.
  • Text Mining

    Text Mining

    Derive high-quality information from text to improve your understanding of behaviours and patterns.
  • Bioinformatics

    Bioinformatics

    "Increase your probability of solving formal and practical challenges arising from the analysis of biological data."
  • Bioinformatics

    Bioinformatics

    Learn how to make knowledge-driven decisions that can revolutionize your business performance.
  • Financial Services

    Financial Services

    Analyze your spending and financial investments to help influence a profitable future for your company
  • Industrial Optimisation

    Industrial Optimisation

    Overcome retail challenges and achieve new levels of predictive accuracy, profitability and reliability.
  • Music

    Music

    Predict musical score groupings, composers that complement each other and what song listeners prefer to listen to.
  • Retail Analytics

    Retail Analytics

    Make smarter decisions to help manage your business more effectively and efficiently.
  • SPM Version 8 Just Released!

    SPM Version 8 Just Released!

    Salford Systems' applications span every major industry and business function

    Read more

  • Environmental Forecasting

    Environmental Forecasting

    Forecast the evolution of environmental outcomes using changes in habitat and climate as predictors.
  • Sports Analytics

    Sports Analytics

    Discover the undisclosed predictors to successful athletic performance using modern decision trees.
  • Targeted Marketing

    Targeted Marketing

    Enabling you to get appropriate prospective customers more efficiently than any other marketing strategies.
  • Text Mining

    Text Mining

    Derive high-quality information from text to improve your understanding of behaviours and patterns.
  • Bioinformatics

    Bioinformatics

    Increase your probability of solving formal and practical challenges arising from the analysis of biological data.
  • Business

    Business

    Learn how to make knowledge-driven decisions that can revolutionize your business performance.
  • Financial Services

    Financial Services

    Analyze your spending and financial investments to help influence a profitable future for your company
  • Industrial Optimisation

    Industrial Optimisation

    Overcome retail challenges and achieve new levels of predictive accuracy, profitability and reliability.
  • Music

    Music

    Predict musical score groupings, composers that complement each other and what song listeners prefer to listen to.
  • Retail Analytics

    Retail Analytics

    Make smarter decisions to help manage your business more effectively and efficiently.

Get In Touch With Us

Request online support

Ph: 619-543-8880
9685 Via Excelencia, Suite 208, San Diego, CA 92126