Scaling Laws of Distributed Random Forests
Abstract
Random forests are a widely used machine learning technique valued for their robust predictive performance and conceptual simplicity. They are applied in many critical applications and often combined with federated learning to collaboratively build machine learning models across multiple distributed sites. The independent decision trees make random forests inherently parallelizable and well-suited for distributed and federated settings. Despite this perfect fit, there is a lack of comprehensive scalability studies, and many existing methods show limited parallel efficiency or are tested only at smaller scales. To address this gap, we present a comprehensive analysis of the scaling capabilities of distributed random forests on up to 64 compute nodes. Using a tree-parallel approach, we demonstrate a strong scaling speedup of up to 31.98 and a weak scaling efficiency of over 0.96 without affecting predictive performance of the global model. Comparing the performance trade-offs of distributed and local inference strategies enables us to simulate various real-life scenarios in terms of distributed computing resources, data availability, and privacy considerations. We further explore how increasing model and data size improves prediction accuracy, scaling up to 51 200 trees and 7.5 million training samples. We find that while distributing the data across nodes leads to super-scalar speedup, it negates the predictive benefit of increased data. Finally, we study the impact of distributed and non-IID data and find that while global imbalance reduces performance, local distribution differences can help mitigate this effect.
Cite
Text
Flügel et al. "Scaling Laws of Distributed Random Forests." Transactions on Machine Learning Research, 2025.Markdown
[Flügel et al. "Scaling Laws of Distributed Random Forests." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/flugel2025tmlr-scaling/)BibTeX
@article{flugel2025tmlr-scaling,
title = {{Scaling Laws of Distributed Random Forests}},
author = {Flügel, Katharina and Debus, Charlotte and Götz, Markus and Streit, Achim and Weiel, Marie},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/flugel2025tmlr-scaling/}
}