Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms

Abstract

We study the problem of robustly learning multi-dimensional histograms. A $d$-dimensional function $h: D \rightarrow \mathbb{R}$ is called a $k$-histogram if there exists a partition of the domain $D \subseteq \mathbb{R}^d$ into $k$ axis-aligned rectangles such that $h$ is constant within each such rectangle. Let $f: D \rightarrow \mathbb{R}$ be a $d$-dimensional probability density function and suppose that $f$ is $\mathrm{OPT}$-close, in $L_1$-distance, to an unknown $k$-histogram (with unknown partition). Our goal is to output a hypothesis that is $O(\mathrm{OPT}) + \epsilon$ close to $f$, in $L_1$-distance. We give an algorithm for this learning problem that uses $n = \tilde{O}_d(k/\epsilon^2)$ samples and runs in time $\tilde{O}_d(n)$. For any fixed dimension, our algorithm has optimal sample complexity, up to logarithmic factors, and runs in near-linear time. Prior to our work, the time complexity of the $d=1$ case was well-understood, but significant gaps in our understanding remained even for $d=2$.

Cite

Text

Diakonikolas et al. "Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms." Annual Conference on Computational Learning Theory, 2018.

Markdown

[Diakonikolas et al. "Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms." Annual Conference on Computational Learning Theory, 2018.](https://mlanthology.org/colt/2018/diakonikolas2018colt-fast/)

BibTeX

@inproceedings{diakonikolas2018colt-fast,
  title     = {{Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms}},
  author    = {Diakonikolas, Ilias and Li, Jerry and Schmidt, Ludwig},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2018},
  pages     = {819-842},
  url       = {https://mlanthology.org/colt/2018/diakonikolas2018colt-fast/}
}