Optimal Weighted Random Forests

Abstract

The random forest (RF) algorithm has become a very popular prediction method for its great flexibility and promising accuracy. In RF, it is conventional to put equal weights on all the base learners (trees) to aggregate their predictions. However, the predictive performance of different trees within the forest can vary significantly due to the randomization of the embedded bootstrap sampling and feature selection. In this paper, we focus on RF for regression and propose two optimal weighting algorithms, namely the 1 Step Optimal Weighted RF (1step-WRF$_\mathrm{opt}$) and 2 Steps Optimal Weighted RF (2steps-WRF$_\mathrm{opt}$), that combine the base learners through the weights determined by weight choice criteria. Under some regularity conditions, we show that these algorithms are asymptotically optimal in the sense that the resulting squared loss and risk are asymptotically identical to those of the infeasible but best possible weighted RF. Numerical studies conducted on real-world data sets and semi-synthetic data sets indicate that these algorithms outperform the equal-weight forest and two other weighted RFs proposed in the existing literature in most cases.

Cite

Text

Chen et al. "Optimal Weighted Random Forests." Journal of Machine Learning Research, 2024.

Markdown

[Chen et al. "Optimal Weighted Random Forests." Journal of Machine Learning Research, 2024.](https://mlanthology.org/jmlr/2024/chen2024jmlr-optimal-a/)

BibTeX

@article{chen2024jmlr-optimal-a,
  title     = {{Optimal Weighted Random Forests}},
  author    = {Chen, Xinyu and Yu, Dalei and Zhang, Xinyu},
  journal   = {Journal of Machine Learning Research},
  year      = {2024},
  pages     = {1-81},
  volume    = {25},
  url       = {https://mlanthology.org/jmlr/2024/chen2024jmlr-optimal-a/}
}