A Geometric Analysis of PCA
Abstract
What property of the data distribution determines the excess risk of principal component analysis? In this paper, we provide a precise answer to this question. We establish a central limit theorem for the error of the principal subspace estimated by PCA, and derive the asymptotic distribution of its excess risk under the reconstruction loss. We obtain a non-asymptotic upper bound on the excess risk of PCA that recovers, in the large sample limit, our asymptotic characterization. Underlying our contributions is the following result: we prove that the negative block Rayleigh quotient, defined on the Grassmannian, is generalized self-concordant along geodesics emanating from its minimizer of maximum rotation less than $\pi/4$.
Cite
Text
El Hanchi et al. "A Geometric Analysis of PCA." Advances in Neural Information Processing Systems, 2025.Markdown
[El Hanchi et al. "A Geometric Analysis of PCA." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/hanchi2025neurips-geometric/)BibTeX
@inproceedings{hanchi2025neurips-geometric,
title = {{A Geometric Analysis of PCA}},
author = {El Hanchi, Ayoub and Erdogdu, Murat A and Maddison, Chris J.},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/hanchi2025neurips-geometric/}
}