PRIMO: Private Regression in Multiple Outcomes

Abstract

We introduce a new private regression setting we call \textit{Private Regression in Multiple Outcomes} (PRIMO), inspired by the common situation where a data analyst wants to perform a set of $l$ regressions while preserving privacy, where the features $X$ are shared across all $l$ regressions, and each regression $i \in [l]$ has a different vector of outcomes $y_i$. Naively applying existing private linear regression techniques $l$ times leads to a $\sqrt{l}$ multiplicative increase in error over the standard linear regression setting. We apply a variety of techniques including sufficient statistics perturbation (SSP) and geometric projection-based methods to develop scalable algorithms that outperform this baseline across a range of parameter regimes. In particular, we obtain \textit{no dependence on l} in the asympotic error when $l$ is sufficiently large. We apply our algorithms to the task of private genomic risk prediction for multiple phenotypes. Empirically, we find that even for values of $l$ far smaller than the theory would predict, our projection-based method improves the accuracy relative to the variant that doesn't use the projection.

Cite

Text

Neel. "PRIMO: Private Regression in Multiple Outcomes." Transactions on Machine Learning Research, 2025.

Markdown

[Neel. "PRIMO: Private Regression in Multiple Outcomes." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/neel2025tmlr-primo/)

BibTeX

@article{neel2025tmlr-primo,
  title     = {{PRIMO: Private Regression in Multiple Outcomes}},
  author    = {Neel, Seth},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/neel2025tmlr-primo/}
}