Statistical Inference for Gradient Boosting Regression
Abstract
Gradient boosting is widely popular due to its flexibility and predictive accuracy. However, statistical inference and uncertainty quantification for gradient boosting remain challenging and under-explored. We propose a unified framework for statistical inference in gradient boosting regression. Our framework integrates dropout or parallel training with a recently proposed regularization procedure called Boulevard that allows for a central limit theorem (CLT) for boosting. With these enhancements, we surprisingly find that \textit{increasing} the dropout rate and the number of trees grown in parallel at each iteration substantially enhances signal recovery and overall performance. Our resulting algorithms enjoy similar CLTs, which we use to construct built-in confidence intervals, prediction intervals, and rigorous hypothesis tests for assessing variable importance in only $O(nd^2)$ time with the Nyström method. Numerical experiments verify the asymptotic normality and demonstrate that our algorithms perform well, do not require early stopping, interpolate between regularized boosting and random forests, and confirm the validity of their built-in statistical inference procedures.
Cite
Text
Fang et al. "Statistical Inference for Gradient Boosting Regression." Advances in Neural Information Processing Systems, 2025.Markdown
[Fang et al. "Statistical Inference for Gradient Boosting Regression." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/fang2025neurips-statistical/)BibTeX
@inproceedings{fang2025neurips-statistical,
title = {{Statistical Inference for Gradient Boosting Regression}},
author = {Fang, Haimo and Tan, Kevin and Hooker, Giles},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/fang2025neurips-statistical/}
}