Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality
Abstract
Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than $p=100s$ of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a cutting-plane method which solves the problem to certifiable optimality at the scale of selecting $k=5$ covariates from $p=300$ variables, and provides small bound gaps at a larger scale. We also propose a convex relaxation and greedy rounding scheme that provides bound gaps of $1-2\%$ in practice within minutes for $p=100$s or hours for $p=1,000$s and is therefore a viable alternative to the exact method at scale. Using real-world financial and medical data sets, we illustrate our approach's ability to derive interpretable principal components tractably at scale.
Cite
Text
Bertsimas et al. "Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality." Journal of Machine Learning Research, 2022.Markdown
[Bertsimas et al. "Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality." Journal of Machine Learning Research, 2022.](https://mlanthology.org/jmlr/2022/bertsimas2022jmlr-solving/)BibTeX
@article{bertsimas2022jmlr-solving,
title = {{Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality}},
author = {Bertsimas, Dimitris and Cory-Wright, Ryan and Pauphilet, Jean},
journal = {Journal of Machine Learning Research},
year = {2022},
pages = {1-35},
volume = {23},
url = {https://mlanthology.org/jmlr/2022/bertsimas2022jmlr-solving/}
}