Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing

Abstract

We develop two methods for the following fundamental statistical task: given an $\eps$-corrupted set of $n$ samples from a $d$-dimensional sub-Gaussian distribution, return an approximate top eigenvector of the covariance matrix. Our first robust PCA algorithm runs in polynomial time, returns a $1 - O(\eps\log\eps^{-1})$-approximate top eigenvector, and is based on a simple iterative filtering approach. Our second, which attains a slightly worse approximation factor, runs in nearly-linear time and sample complexity under a mild spectral gap assumption. These are the first polynomial-time algorithms yielding non-trivial information about the covariance of a corrupted sub-Gaussian distribution without requiring additional algebraic structure of moments. As a key technical tool, we develop the first width-independent solvers for Schatten-$p$ norm packing semidefinite programs, giving a $(1 + \eps)$-approximate solution in $O(p\log(\tfrac{nd}{\eps})\eps^{-1})$ input-sparsity time iterations (where $n$, $d$ are problem dimensions).

PDF NeurIPS Semantic Scholar

Cite

Text

Jambulapati et al. "Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing." Neural Information Processing Systems, 2020.

Markdown

[Jambulapati et al. "Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/jambulapati2020neurips-robust/)

BibTeX

@inproceedings{jambulapati2020neurips-robust,
  title     = {{Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing}},
  author    = {Jambulapati, Arun and Li, Jerry and Tian, Kevin},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/jambulapati2020neurips-robust/}
}