Robust Sparse Mean Estimation via Sum of Squares

Abstract

We study the problem of high-dimensional sparse mean estimation in the presence of an $\epsilon$-fraction of adversarial outliers. Prior work obtained sample and computationally efficient algorithms for this task for identity-covariance subgaussian distributions. In this work, we develop the first efficient algorithms for robust sparse mean estimation without a priori knowledge of the covariance. For distributions on $\mathbb{R}^d$ with ‘certifiably bounded’ $t$-th moments and sufficiently light tails, our algorithm achieves error of $O(\epsilon^{1-1/t})$ with sample complexity $m = (k\log(d))^{O(t)}/\epsilon^{2-2/t}$. For the special case of the Gaussian distribution, our algorithm achieves near-optimal error of $\tilde O(\epsilon)$ with sample complexity $m = O(k^4 \mathrm{polylog}(d))/\epsilon^2$. Our algorithms follow the Sum-of-Squares based proofs to algorithms approach. We complement our upper bounds with Statistical Query and low-degree polynomial testing lower bounds, providing evidence that the sample-time-error tradeoffs achieved by our algorithms are qualitatively best possible.

Cite

Text

Diakonikolas et al. "Robust Sparse Mean Estimation via Sum of Squares." Conference on Learning Theory, 2022.

Markdown

[Diakonikolas et al. "Robust Sparse Mean Estimation via Sum of Squares." Conference on Learning Theory, 2022.](https://mlanthology.org/colt/2022/diakonikolas2022colt-robust/)

BibTeX

@inproceedings{diakonikolas2022colt-robust,
  title     = {{Robust Sparse Mean Estimation via Sum of Squares}},
  author    = {Diakonikolas, Ilias and Kane, Daniel M. and Karmalkar, Sushrut and Pensia, Ankit and Pittas, Thanasis},
  booktitle = {Conference on Learning Theory},
  year      = {2022},
  pages     = {4703-4763},
  volume    = {178},
  url       = {https://mlanthology.org/colt/2022/diakonikolas2022colt-robust/}
}