Immediate Split Trees: Immediate Encoding of Floating Point Split Values in Random Forests

Abstract

Random forests and decision trees are increasingly interesting candidates for resource-constrained machine learning models. In order to make the execution of these models efficient under resource limitations, various optimized implementations have been proposed in the literature, usually implementing either native trees or if-else trees . While a certain motivation for the optimization of if-else trees is to benefit the behavior of dedicated instruction caches, in this work we highlight that if-else trees might also strongly depend on data caches. We identify one crucial issue of if-else tree implementations and propose an optimized implementation, which keeps the logic tree structure untouched and thus does not influence the accuracy, but eliminates the need to load comparison values from the data caches. Experimental evaluation of this implementation shows that we can greatly reduce the amount of data cache misses by up to $\approx 99\%$ ≈ 99 % , while not increasing the amount of instruction cache misses in comparison to the state-of-the-art. We additionally highlight various scenarios, where the reduction of data cache misses draws important benefit on the allover execution time.

Cite

Text

Hakert et al. "Immediate Split Trees: Immediate Encoding of Floating Point Split Values in Random Forests." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022. doi:10.1007/978-3-031-26419-1_32

Markdown

[Hakert et al. "Immediate Split Trees: Immediate Encoding of Floating Point Split Values in Random Forests." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022.](https://mlanthology.org/ecmlpkdd/2022/hakert2022ecmlpkdd-immediate/) doi:10.1007/978-3-031-26419-1_32

BibTeX

@inproceedings{hakert2022ecmlpkdd-immediate,
  title     = {{Immediate Split Trees: Immediate Encoding of Floating Point Split Values in Random Forests}},
  author    = {Hakert, Christian and Chen, Kuan-Hsun and Chen, Jian-Jia},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2022},
  pages     = {531-546},
  doi       = {10.1007/978-3-031-26419-1_32},
  url       = {https://mlanthology.org/ecmlpkdd/2022/hakert2022ecmlpkdd-immediate/}
}