On Model Identification and Out-of-Sample Prediction of PCR with Applications to Synthetic Controls
Abstract
We analyze principal component regression (PCR) in a high-dimensional error-in-variables setting with fixed design. Under suitable conditions, we show that PCR consistently identifies the unique model with minimum $\ell_2$-norm. These results enable us to establish non-asymptotic out-of-sample prediction guarantees that improve upon the best known rates. In the course of our analysis, we introduce a natural linear algebraic condition between the in- and out-of-sample covariates, which allows us to avoid distributional assumptions for out-of-sample predictions. Our simulations illustrate the importance of this condition for generalization, even under covariate shifts. Accordingly, we construct a hypothesis test to check when this condition holds in practice. As a byproduct, our results also lead to novel results for the synthetic controls literature, a leading approach for policy evaluation. To the best of our knowledge, our prediction guarantees for the fixed design setting have been elusive in both the high-dimensional error-in-variables and synthetic controls literatures.
Cite
Text
Agarwal et al. "On Model Identification and Out-of-Sample Prediction of PCR with Applications to Synthetic Controls." Journal of Machine Learning Research, 2025.Markdown
[Agarwal et al. "On Model Identification and Out-of-Sample Prediction of PCR with Applications to Synthetic Controls." Journal of Machine Learning Research, 2025.](https://mlanthology.org/jmlr/2025/agarwal2025jmlr-model/)BibTeX
@article{agarwal2025jmlr-model,
title = {{On Model Identification and Out-of-Sample Prediction of PCR with Applications to Synthetic Controls}},
author = {Agarwal, Anish and Shah, Devavrat and Shen, Dennis},
journal = {Journal of Machine Learning Research},
year = {2025},
pages = {1-58},
volume = {26},
url = {https://mlanthology.org/jmlr/2025/agarwal2025jmlr-model/}
}