The Sample Complexity of Multi-Distribution Learning

Abstract

Multi-distribution learning generalizes the classic PAC learning to handle data coming from multiple distributions. Given a set of $k$ data distributions and a hypothesis class of VC dimension $d$, the goal is to learn a hypothesis that minimizes the maximum population loss over $k$ distributions, up to $\epsilon$ additive error. In this paper, we settle the sample complexity of multi-distribution learning by giving an algorithm of sample complexity $\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}$. This matches the lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem of Awasthi, Haghtalab and Zhao.

Cite

Text

Peng. "The Sample Complexity of Multi-Distribution Learning." Conference on Learning Theory, 2024.

Markdown

[Peng. "The Sample Complexity of Multi-Distribution Learning." Conference on Learning Theory, 2024.](https://mlanthology.org/colt/2024/peng2024colt-sample/)

BibTeX

@inproceedings{peng2024colt-sample,
  title     = {{The Sample Complexity of Multi-Distribution Learning}},
  author    = {Peng, Binghui},
  booktitle = {Conference on Learning Theory},
  year      = {2024},
  pages     = {4185-4204},
  volume    = {247},
  url       = {https://mlanthology.org/colt/2024/peng2024colt-sample/}
}