Download Now Instant Evaluation
Get Price Quote

Advances in Gradient Boosting: The Power of Post Processing

Advances in Gradient Boosting: The Power of Post Processing

Click to View / Download PDF

Advances in Gradient Boosting: the Power of Post-Processing

Learn how TreeNet stochastic gradient boosting can be improved by post processing techniques such as GPS Generalized Path Seeker, RuleLearner, and ISLE.

Course Outline:

I. Gradient Boosting and Post-Processing:

  • What is missing from Gradient Boosting?
  • Why post-processing techniques are used?

II. Applications Benefiting from Post-Processing: Examples from a variety of industries.

  • Financial Services
  • Biomedical
  • Environmental
  • Manufacturing
  • Adserving

III. Typical Post-Processing Steps


IV. Techniques

  • Generalized Path Seeker (GPS): Modern high-speed LASSO-style regularized regression
  • Importance Sampled Learning Ensembles (ISLE): identify and reweight the most influential trees
  • RuleLearner: ISLE on “steroids.” Identify the most influential nodes and rules

V. Case Study Example

  • Output/Results without Post-Processing
  • Output/Results with Post-Processing
  • Demo

Watch the Video





Components and Features

Download Components and Features 

SPM Components and FeaturesWhat's New
SPM Components and FeaturesWhat's New
CART (Classification and Regression Trees) User defined linear combination lists for splitting; Constrains on trees; Automatic addition of missing value indicators; Enhanced GUI reporting; User controlled Cross Validation; Out-of-bag performance stats and predictions; Profiling terminals nodes based on user supplied variables; Comparison of Train vs. Test consistency across nodes; RandomForests-style variable importance
MARS (Automated Nonlinear Regression) Updated GUI interface; Model performance based on independent test sample or Cross Validation; Support for time series models
TreeNet (Gradient Boosting, Boosted Trees) One-Tree TreeNet (CART alternative); RandomForests via TreeNet (RandomForests regression alternative) Interaction Control Language (ICL); Interaction strength reporting; Enhanced partial dependency plots; RandomForests-style randomized splits;
RandomForests (Bagging Trees) RandomForests regression; Saving out-of-bag scores; Speed enhancements
High-Dimensional Multivariate Pattern Discovery Battery Target (link) to identify mutual dependencies in the data
Unsupervised Learning (Breiman's Column Scrambler) New
Text Mining New
Model Compression and Rule Extraction New: ISLE; RuleLearner; Hybrid Compression
Automation 56 pre-packaged scenarios based on years of high-end consulting
Parallel Processing New: Automatic support of multiple cores via multithreading
Interaction Detection  
Hotspot Detection Segment Extraction (Battery Priors)
Missing Value Handling and Imputation  
Outlier Detection New: GUI reports, tables, and graphs
Linear Methods for Regression, Recent Advances and Discoveries New: OLS Regression; Regularized Regression Including: LAR/LASSO Regression; Ridge Regression; Elastic Net Regression/ Generalized Path Seeker
Linear Methods for Classification, Recent Advances and Discoveries New: LOGIT; LAR/LASSO; Ridge; Elastic Net/ Generalized Path Seeker
Model Assessment and Selection Unified reporting of various performance measures across different models
Ensemble Learning New: Battery Bootstrap; Battery Model
Time Series Modeling New
Model Simplification Methods   
Data Preparation New: Battery Bin for automatic binning of a user selected set of variables with large number of options
Large Data Handling 64 bit support; Large memory capacity limited only by your hardware
Model Translation (SAS, C, Java, PMML, Classic) Java
Data Access (all popular statistical formats supported) Updated Stat Transfer Drivers including R workspaces
Model Scoring Score Ensemble (combines multiple models into a powerful predictive machine)


AutoDiscovery of Predictors in SPM

Autodiscovery leverages the stability advantages of multiple trees to rank variables for importance and thus select a subset of predictors for modeling. In SPM® v8.2 and earlier Autodiscovery runs a very simple training data only TreeNet model growing out to 200 trees. The variable importance ranking generated from this model is then used to reduce the list of all available predictors down to the top performing predictors in this background model. Autodiscovery is fast and easy, as there are no control parameters to set, but it is just a mechanism for quickly testing whether a substantial refinement in the number of predictors can improve model performance.

How to access data in relational databases via ODBC

SPM 6.6 (TreeNet TN 6.4) or greater supports data access to Microsoft SQL Server, Oracle, MySQL and other RDMS via ODBC interface.

Since SQL Queries cannot be entered via standard Windows ODBC dialog data source selection dialog, one has to use command line to open data directly from SQL Server.


How To Unlock The 30-Day Free Evaluation of Salford Predictive Modeler

The SPM® software suite must be downloaded with Administrator rights and read/write & modify permissions MUST be applied to the /bin directory PRIOR to proceeding. If you need help with SPM Installation (Administrator Rights & Ensuring Proper Permissions),
please contact usor email  Support (at) salford-systems (dot) com
Once the above instructions have been completed, you can now request your Unlock Key.
To unlock the SPM software for your 30–day free evaluation, please FILL OUT THIS FORM
or e–mail the following information to Unlock (at) salford-systems (dot) com

MARS - Multivariate Adaptive Regression Splines®


Automatic Non-Linear Regression
MARS software is ideal for users who prefer results in a form similar to traditional regression while capturing essential nonlinearities and interactions. The MARS approach to regression modeling effectively uncovers important data patterns and relationships that are difficult, if not impossible, for other regression methods to reveal. MARS builds its model by piecing together a series of straight lines with each allowed its own slope. This permits MARS to trace out any pattern detected in the data.
High-Quality Regression and Classification
The MARS model is designed to predict numeric outcomes such as the average monthly bill of a mobile phone customer or the amount that a shopper is expected to spend in a web site visit. MARS is also capable of producing high quality classification models for a yes/no outcome. MARS performs variable selection, variable transformation, interaction detection, and self-testing, all automatically and at high speed.
High-Performance Results
Areas where MARS has exhibited very high-performance results include forecasting electricity demand for power generating companies, relating customer satisfaction scores to the engineering specifications of products, and presence/absence modeling in geographical information systems (GIS).


Product Versions

SPM® 8.2 Product Versions

The best of the best. For the modeler who must have access to leading edge technology available and fastest run times including major advances in ensemble modeling, interaction detection and automation. ULTRA also provides advance access to new features as they become available in frequent upgrades.
For the modeler who needs cutting-edge data mining technology, including extensive automation of workflows typical for experienced data analysts and dozens of extensions to the Salford data mining engines.
A true predictive modeling workbench designed for the professional data miner. Variety of supporting conventional statistical modeling tools, programming language, reporting services, and a modest selection of workflow automation options.
Literally the basics. Salford Systems award winning data mining engines without extensions or automation or surrounding statistical services, programming language, and sophisticated reporting. Designed for small budgets while still delivering our world famous engines



Memory Requirements for the Salford Predictive Modeler software suite

A user's license sets a limit on the amount of learn sample data that can be analyzed. The learn sample is the data used to build the model. Note that there is no limit to the number of test sample data points that may be analyzed. In other words, rows -by- columns of variable and observations used to build the model. Variable not used in the model do not count. Observations reserved for testing, or excluded for other reasons, do not count.

Reading MySQL tables with SPM

SPM for Windows has long had the ability to read tables in relational databases through the ODBC interface. This capability was also recently added to the command line version on Windows and it is planned on UNIX platforms (including MacOS X). The purpose of this article is to describe how to access MySQL databases specifically, but the same principles will apply to accessing data stored in other relational database systems. Probably, the only thing that will differ will be the driver used.

  • 1
  • 2

Get In Touch With Us

Contact Us

9685 Via Excelencia, Suite 208, San Diego, CA 92126
Ph: 619-543-8880
Fax: 619-543-8888
info (at) salford-systems (dot) com