Fast Gradient Boosting Decision Trees with Bit-Level Data Structures
Abstract
A gradient boosting decision tree model is a powerful machine learning method that iteratively constructs decision trees to form an additive ensemble model. The method uses the gradient of the loss function to improve the model at each iteration step. Inspired by the database literature, we exploit bitset and bitslice data structures in order to improve the run time efficiency of learning the trees. We can use these structures in two ways. First, they can represent the input data itself. Second, they can store the discretized gradient values used by the learning algorithm to construct the trees in the boosting model. Using these bit-level data structures reduces the problem of finding the best split, which involves counting of instances and summing gradient values, to counting one-bits in bit strings. Modern CPUs can efficiently count one-bits using AVX2 SIMD instructions. Empirically, our proposed improvements can result in speed-ups of 2 to up to 10 times on datasets with a large number of categorical features without sacrificing predictive performance.
Cite
Text
Devos et al. "Fast Gradient Boosting Decision Trees with Bit-Level Data Structures." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2019. doi:10.1007/978-3-030-46150-8_35Markdown
[Devos et al. "Fast Gradient Boosting Decision Trees with Bit-Level Data Structures." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2019.](https://mlanthology.org/ecmlpkdd/2019/devos2019ecmlpkdd-fast/) doi:10.1007/978-3-030-46150-8_35BibTeX
@inproceedings{devos2019ecmlpkdd-fast,
title = {{Fast Gradient Boosting Decision Trees with Bit-Level Data Structures}},
author = {Devos, Laurens and Meert, Wannes and Davis, Jesse},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2019},
pages = {590-606},
doi = {10.1007/978-3-030-46150-8_35},
url = {https://mlanthology.org/ecmlpkdd/2019/devos2019ecmlpkdd-fast/}
}