A Geometric Analysis of PCA

Abstract

What property of the data distribution determines the excess risk of principal component analysis? In this paper, we provide a precise answer to this question. We establish a central limit theorem for the error of the principal subspace estimated by PCA, and derive the asymptotic distribution of its excess risk under the reconstruction loss. We obtain a non-asymptotic upper bound on the excess risk of PCA that recovers, in the large sample limit, our asymptotic characterization. Underlying our contributions is the following result: we prove that the negative block Rayleigh quotient, defined on the Grassmannian, is generalized self-concordant along geodesics emanating from its minimizer of maximum rotation less than $\pi/4$.

Cite

Text

El Hanchi et al. "A Geometric Analysis of PCA." Advances in Neural Information Processing Systems, 2025.

Markdown

[El Hanchi et al. "A Geometric Analysis of PCA." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/hanchi2025neurips-geometric/)

BibTeX

@inproceedings{hanchi2025neurips-geometric,
  title     = {{A Geometric Analysis of PCA}},
  author    = {El Hanchi, Ayoub and Erdogdu, Murat A and Maddison, Chris J.},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/hanchi2025neurips-geometric/}
}