Robust Subspace Approximation in a Stream
Abstract
We study robust subspace estimation in the streaming and distributed settings. Given a set of n data points a_i_i=1^n in R^d and an integer k, we wish to find a linear subspace S of dimension k for which sum_i M(dist(S, a_i)) is minimized, where dist(S,x) := min_y in S |x-y|_2, and M() is some loss function. When M is the identity function, S gives a subspace that is more robust to outliers than that provided by the truncated SVD. Though the problem is NP-hard, it is approximable within a (1+epsilon) factor in polynomial time when k and epsilon are constant. We give the first sublinear approximation algorithm for this problem in the turnstile streaming and arbitrary partition distributed models, achieving the same time guarantees as in the offline case. Our algorithm is the first based entirely on oblivious dimensionality reduction, and significantly simplifies prior methods for this problem, which held in neither the streaming nor distributed models.
Cite
Text
Levin et al. "Robust Subspace Approximation in a Stream." Neural Information Processing Systems, 2018.Markdown
[Levin et al. "Robust Subspace Approximation in a Stream." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/levin2018neurips-robust/)BibTeX
@inproceedings{levin2018neurips-robust,
title = {{Robust Subspace Approximation in a Stream}},
author = {Levin, Roie and Sevekari, Anish Prasad and Woodruff, David},
booktitle = {Neural Information Processing Systems},
year = {2018},
pages = {10683-10693},
url = {https://mlanthology.org/neurips/2018/levin2018neurips-robust/}
}