Coresets for Decision Trees of Signals

Abstract

A $k$-decision tree $t$ (or $k$-tree) is a recursive partition of a matrix (2D-signal) into $k\geq 1$ block matrices (axis-parallel rectangles, leaves) where each rectangle is assigned a real label. Its regression or classification loss to a given matrix $D$ of $N$ entries (labels) is the sum of squared differences over every label in $D$ and its assigned label by $t$.Given an error parameter $\varepsilon\in(0,1)$, a $(k,\varepsilon)$-coreset $C$ of $D$ is a small summarization that provably approximates this loss to \emph{every} such tree, up to a multiplicative factor of $1\pm\varepsilon$. In particular, the optimal $k$-tree of $C$ is a $(1+\varepsilon)$-approximation to the optimal $k$-tree of $D$.We provide the first algorithm that outputs such a $(k,\varepsilon)$-coreset for \emph{every} such matrix $D$. The size $|C|$ of the coreset is polynomial in $k\log(N)/\varepsilon$, and its construction takes $O(Nk)$ time.This is by forging a link between decision trees from machine learning -- to partition trees in computational geometry. Experimental results on \texttt{sklearn} and \texttt{lightGBM} show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x$10$, while keeping similar accuracy. Full open source code is provided.

Cite

Text

Jubran et al. "Coresets for Decision Trees of Signals." Neural Information Processing Systems, 2021.

Markdown

[Jubran et al. "Coresets for Decision Trees of Signals." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/jubran2021neurips-coresets/)

BibTeX

@inproceedings{jubran2021neurips-coresets,
  title     = {{Coresets for Decision Trees of Signals}},
  author    = {Jubran, Ibrahim and Shayda, Ernesto Evgeniy Sanches and Newman, Ilan I and Feldman, Dan},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/jubran2021neurips-coresets/}
}