6 Tips To Optimize TreeNet Models
While TreeNet (Stochastic Gradient Boosting) can work phenomenally well out of the box it almost always pays to try to tune your control parameters. Devoting time to optimizing a TreeNet model can improve its out of sample performance noticeably. Here is a list of several things recommended for all TreeNet users.
1) Make sure you grow enough trees.
TreeNet starts with 200 trees by default, although you can reset default. In real-world modeling we often find that 1,000 or more trees perform better. In one circumstance 30,000 trees gave us best results although such a situation is expected to be rare.
If the optimal number of trees is more than 70 percent of the total allowed, then you should increase the number of trees and rebuild the model. Thus, if you grow 200 trees and the optimal number comes back as 198 then re-build with say 400 or more. Â If the optimal number continues to be close to the maximum number allowed, continue to increase the number of trees.
2) Try a slower learning rate.
This one goes hand in hand with growing enough trees because the slower your learn rate is, the more trees you will need. There is nothing wrong with using a learn rate of .001 if you are willing to let your machine run through all the trees you will need.
3) For the binary logistic (0/1) model the INFLUENCE TRIMMING can make a big difference. (Logistic residual trim fraction.)
This control determines whether TreeNet will ignore certain records because they are "far from the decision boundary." Â Allowing influence trimming is especially important if you think some of your data suffers from mislabeling (the target variable value is actually wrong). The default value of 0.10 means that 10% of the data could be ignored in each training cycle.
You ought to experiment with a value of 0.0 to see if it helps or hurts. You can also try values such as 0.02, 0.05 etc.
Note: If the data are very clean 0.0 should work best.
4) Experiment with the size of your trees.
If there are truly no interactions in the process that generates your data then growing a model that does not allow for interactions should give better results. This means 2-node trees are a plausible tree size. Just make sure to allow for extra trees as a 2-node tree learns much more slowly than a 6-node tree. If 500 trees are needed when you generate 6 node trees, you might need 1500 or more when generating just 2-node trees.
Sometimes moderately large trees work best: 12-node, 15-node, even 25-node trees could do the trick. Since large trees learn more than smaller trees, you might also need to dial down the learn rate to prevent over-fitting.
5) If you have a PRO EX version of TreeNet then you have access to advanced batteries. Try battery shaving to see if you can trim down your list of predictors.
Also try battery LOVO (leave one variable out) as this might allow you to remove a variable from the middle of the pack in terms of importance; battery SHAVING usually is used to remove the least important variables (shaving from the bottom of the list).
BATTERY SHAVING TOP tests the viability of dropping the "best" variables. Sometimes this leads to a better model.
6) If you have access to the ICL (Interaction Control Language) of TreeNet PRO EX then you can try an interaction refinement strategy.
First, run some completely additive models. Unlike 2-node trees that can actually allow interactions due to the manner in which TreeNet handles missing values. With the ICL ADDITIVE command you guarantee no possible interactions of any kind, including interactions between missing value indicators created by TreeNet and other variables.
Then, in the PRO EX version, you can run the BATTERY ADDITIVE procedure which will start with a fully flexible model and search for the one variable which can most readily be made additive (interact with nothing). Then it searches for a second variable to be made additive, and so on, going step by step until all variables are additive. Reviewing the performance curve of this procedure allows the discovery of the optimal balance between full free interactivity and limited interactivity. If a variable or variables really do not interact with any others then preventing chance interactions from creeping into the model will improve the model on future unseen data.