Gradient Boosting with Honest Leaf Estimation
A minimal example showing how separating structure and leaf estimation makes training error less biased.
4/6/20261 min read


In the previous post (https://randomstate21.com/the-myth-that-a-large-train-test-gap-means-overfitting), we saw that part of the train - test gap in gradient boosting comes from how trees are trained - the same data is used both to determine splits and to estimate leaf values.
Here, we implement a simple variant where these steps are separated. Trees are grown on one subset of the data, while leaf values are estimated on another.
Below is a minimal implementation of gradient boosting with honest leaf estimation, using the California Housing dataset as a simple regression example. Model parameters are documented directly in the class docstring.
The code runs the same model in two modes - honest and non-honest - where the only difference is how leaf values are estimated. In both cases, we use a leaf fraction of 0.2, meaning that 20% of the data is reserved for leaf estimation and the remaining 80% for learning the tree structure. In the honest variant, leaf values are computed on this held-out subset, while in the non-honest variant they are effectively estimated from the same data used to grow the tree.
The implementation below includes a few additional features and safeguards, but these are not essential for the example - the key idea is the separation between structure and leaf estimation:
The resulting train and test RMSE curves are shown below:


As expected, the gap between training and test RMSE is smaller in the honest variant than in the non-honest one.
The point here is not which model performs better, but to illustrate how the training procedure itself affects the train-test gap. The honest variant produces a less optimistic training error - and therefore a more meaningful comparison to test performance.
We then track how train and test RMSE evolve as trees are added: