Improved Distributed Principal Component Analysis

Liang, Yingyu; Balcan, Maria-Florina F; Kanchanapally, Vandana; Woodruff, David

Improved Distributed Principal Component Analysis

Yingyu Liang, Maria-Florina F Balcan, Vandana Kanchanapally, David Woodruff

NeurIPS 2014 pp. 3113-3121

/neurips/2014/liang2014neurips-improved/

Abstract

We study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA), in which the servers would like to compute a low dimensional subspace capturing as much of the variance of the union of their point sets as possible. Given a procedure for approximate PCA, one can use it to approximately solve problems such as $k$-means clustering and low rank approximation. The essential properties of an approximate distributed PCA algorithm are its communication cost and computational efficiency for a given desired accuracy in downstream applications. We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for $k$-means clustering and related problems. Our empirical study on real world data shows a speedup of orders of magnitude, preserving communication with only a negligible degradation in solution quality. Some of these techniques we develop, such as input-sparsity subspace embeddings with high correctness probability with a dimension and sparsity independent of the error probability, may be of independent interest.

PDF NeurIPS Semantic Scholar

Cite

Text

Liang et al. "Improved Distributed Principal Component Analysis." Neural Information Processing Systems, 2014.

Markdown

[Liang et al. "Improved Distributed Principal Component Analysis." Neural Information Processing Systems, 2014.](https://mlanthology.org/neurips/2014/liang2014neurips-improved/)

BibTeX

@inproceedings{liang2014neurips-improved,
  title     = {{Improved Distributed Principal Component Analysis}},
  author    = {Liang, Yingyu and Balcan, Maria-Florina F and Kanchanapally, Vandana and Woodruff, David},
  booktitle = {Neural Information Processing Systems},
  year      = {2014},
  pages     = {3113-3121},
  url       = {https://mlanthology.org/neurips/2014/liang2014neurips-improved/}
}