Coresets for Decision Trees of Signals

Abstract

A $k$-decision tree $t$ (or $k$-tree) is a recursive partition of a matrix (2D-signal) into $k\geq 1$ block matrices (axis-parallel rectangles, leaves) where each rectangle is assigned a real label. Its regression or classification loss to a given matrix $D$ of $N$ entries (labels) is the sum of squared differences over every label in $D$ and its assigned label by $t$.Given an error parameter $\varepsilon\in(0,1)$, a $(k,\varepsilon)$-coreset $C$ of $D$ is a small summarization that provably approximates this loss to \emph{every} such tree, up to a multiplicative factor of $1\pm\varepsilon$. In particular, the optimal $k$-tree of $C$ is a $(1+\varepsilon)$-approximation to the optimal $k$-tree of $D$.We provide the first algorithm that outputs such a $(k,\varepsilon)$-coreset for \emph{every} such matrix $D$. The size $|C|$ of the coreset is polynomial in $k\log(N)/\varepsilon$, and its construction takes $O(Nk)$ time.This is by forging a link between decision trees from machine learning -- to partition trees in computational geometry. Experimental results on \texttt{sklearn} and \texttt{lightGBM} show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x$10$, while keeping similar accuracy. Full open source code is provided.

PDF NeurIPS OpenReview Code Semantic Scholar

Cite

Text

Jubran et al. "Coresets for Decision Trees of Signals." Neural Information Processing Systems, 2021.

Markdown

[Jubran et al. "Coresets for Decision Trees of Signals." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/jubran2021neurips-coresets/)

BibTeX

@inproceedings{jubran2021neurips-coresets,
  title     = {{Coresets for Decision Trees of Signals}},
  author    = {Jubran, Ibrahim and Shayda, Ernesto Evgeniy Sanches and Newman, Ilan I and Feldman, Dan},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/jubran2021neurips-coresets/}
}