Approximating the Top Eigenvector in Random Order Streams

Abstract

When rows of an $n \times d$ matrix $A$ are given in a stream, we study algorithms for approximating the top eigenvector of $A^T A$ (equivalently, the top right singular vector of $A$). We consider worst case inputs $A$ but assume that the rows are presented to the streaming algorithm in a uniformly random order. We show that when the gap parameter $R = \sigma_1(A)^2/\sigma_2(A)^2 = \Omega(1)$, then there is a randomized algorithm that uses $O(h \cdot d \cdot \text{polylog}(d))$ bits of space and outputs a unit vector $v$ that has a correlation $1 - O(1/\sqrt{R})$ with the top eigenvector $v_1$. Here $h$ denotes the number of ``heavy rows'' in the matrix, defined as the rows with Euclidean norm at least $\|{A}\|_F/\sqrt{d \cdot \text{polylog}(d)}$. We also provide a lower bound showing that any algorithm using $O(hd/R)$ bits of space can obtain at most $1 - \Omega(1/R^2)$ correlation with the top eigenvector. Thus, parameterizing the space complexity in terms of the number of heavy rows is necessary for high accuracy solutions.Our results improve upon the $R = \Omega(\log n \cdot \log d)$ requirement in a recent work of Price. We note that Price's algorithm works for arbitrary order streams whereas our algorithm requires a stronger assumption that the rows are presented in a uniformly random order. We additionally show that the gap requirements in Price's analysis can be brought down to $R = \Omega(\log^2 d)$ for arbitrary order streams and $R = \Omega(\log d)$ for random order streams. The requirement of $R = \Omega(\log d)$ for random order streams is nearly tight for Price's analysis as we obtain a simple instance with $R = \Omega(\log d/\log\log d)$ for which Price's algorithm, with any fixed learning rate, cannot output a vector approximating the top eigenvector $v_1$.

Cite

Text

Kacham and Woodruff. "Approximating the Top Eigenvector in Random Order Streams." Neural Information Processing Systems, 2024. doi:10.52202/079017-3605

Markdown

[Kacham and Woodruff. "Approximating the Top Eigenvector in Random Order Streams." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/kacham2024neurips-approximating/) doi:10.52202/079017-3605

BibTeX

@inproceedings{kacham2024neurips-approximating,
  title     = {{Approximating the Top Eigenvector in Random Order Streams}},
  author    = {Kacham, Praneeth and Woodruff, David P.},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-3605},
  url       = {https://mlanthology.org/neurips/2024/kacham2024neurips-approximating/}
}