Optimal Sparse Regression Trees

Abstract

Regression trees are one of the oldest forms of AI models, and their predictions can be made without a calculator, which makes them broadly useful, particularly for high-stakes applications. Within the large literature on regression trees, there has been little effort towards full provable optimization, mainly due to the computational hardness of the problem. This work proposes a dynamic programming-with-bounds approach to the construction of provably-optimal sparse regression trees. We leverage a novel lower bound based on an optimal solution to the k-Means clustering algorithm on one dimensional data. We are often able to find optimal sparse trees in seconds, even for challenging datasets that involve large numbers of samples and highly-correlated features.

Cite

Text

Zhang et al. "Optimal Sparse Regression Trees." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I9.26334

Markdown

[Zhang et al. "Optimal Sparse Regression Trees." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/zhang2023aaai-optimal/) doi:10.1609/AAAI.V37I9.26334

BibTeX

@inproceedings{zhang2023aaai-optimal,
  title     = {{Optimal Sparse Regression Trees}},
  author    = {Zhang, Rui and Xin, Rui and Seltzer, Margo I. and Rudin, Cynthia},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {11270-11279},
  doi       = {10.1609/AAAI.V37I9.26334},
  url       = {https://mlanthology.org/aaai/2023/zhang2023aaai-optimal/}
}