Robust Sparse Mean Estimation via Sum of Squares
Abstract
We study the problem of high-dimensional sparse mean estimation in the presence of an $\epsilon$-fraction of adversarial outliers. Prior work obtained sample and computationally efficient algorithms for this task for identity-covariance subgaussian distributions. In this work, we develop the first efficient algorithms for robust sparse mean estimation without a priori knowledge of the covariance. For distributions on $\mathbb{R}^d$ with ‘certifiably bounded’ $t$-th moments and sufficiently light tails, our algorithm achieves error of $O(\epsilon^{1-1/t})$ with sample complexity $m = (k\log(d))^{O(t)}/\epsilon^{2-2/t}$. For the special case of the Gaussian distribution, our algorithm achieves near-optimal error of $\tilde O(\epsilon)$ with sample complexity $m = O(k^4 \mathrm{polylog}(d))/\epsilon^2$. Our algorithms follow the Sum-of-Squares based proofs to algorithms approach. We complement our upper bounds with Statistical Query and low-degree polynomial testing lower bounds, providing evidence that the sample-time-error tradeoffs achieved by our algorithms are qualitatively best possible.
Cite
Text
Diakonikolas et al. "Robust Sparse Mean Estimation via Sum of Squares." Conference on Learning Theory, 2022.Markdown
[Diakonikolas et al. "Robust Sparse Mean Estimation via Sum of Squares." Conference on Learning Theory, 2022.](https://mlanthology.org/colt/2022/diakonikolas2022colt-robust/)BibTeX
@inproceedings{diakonikolas2022colt-robust,
title = {{Robust Sparse Mean Estimation via Sum of Squares}},
author = {Diakonikolas, Ilias and Kane, Daniel M. and Karmalkar, Sushrut and Pensia, Ankit and Pittas, Thanasis},
booktitle = {Conference on Learning Theory},
year = {2022},
pages = {4703-4763},
volume = {178},
url = {https://mlanthology.org/colt/2022/diakonikolas2022colt-robust/}
}