Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers

Abstract

We study the problem of robustly estimating the mean of a $d$-dimensional distribution given $N$ examples, where most coordinates of every example may be missing and $\varepsilon N$ examples may be arbitrarily corrupted. Assuming each coordinate appears in a constant factor more than $\varepsilon N$ examples, we show algorithms that estimate the mean of the distribution with information-theoretically optimal dimension-independent error guarantees in nearly-linear time $\widetilde O(Nd)$. Our results extend recent work on computationally-efficient robust estimation to a more widely applicable incomplete-data setting.

Cite

Text

Hu and Reingold. "Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers." Artificial Intelligence and Statistics, 2021.

Markdown

[Hu and Reingold. "Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers." Artificial Intelligence and Statistics, 2021.](https://mlanthology.org/aistats/2021/hu2021aistats-robust/)

BibTeX

@inproceedings{hu2021aistats-robust,
  title     = {{Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers}},
  author    = {Hu, Lunjia and Reingold, Omer},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2021},
  pages     = {1558-1566},
  volume    = {130},
  url       = {https://mlanthology.org/aistats/2021/hu2021aistats-robust/}
}