Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination

Abstract

We study the task of noiseless linear regression under Gaussian covariates in the presence of additive oblivious contamination. Specifically, we are given i.i.d.\ samples from a distribution $(x, y)$ on $\mathbb R^d \times \mathbb R$ with $x \sim \mathcal N(0,I_d)$ and $y = x^\top \beta + z$, where $z$ is drawn from an unknown distribution that is independent of $x$. Moreover, $z$ satisfies $\mathbb P[z = 0] = \alpha>0$. The goal is to accurately recover the regressor $\beta$ to small $\ell_2$-error. Ignoring computational considerations, this problem is known to be solvable using $O(d/\alpha)$ samples. On the other hand, the best known polynomial-time algorithms require $\Omega(d/\alpha^2)$ samples. Here we provide formal evidence that the quadratic dependence in $1/\alpha$ is inherent for efficient algorithms. Specifically, we show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least $\tilde{\Omega}(d^{1/2}/\alpha^2)$.

Cite

Text

Diakonikolas et al. "Information-Computation Tradeoffs  for Noiseless Linear Regression with Oblivious Contamination." Advances in Neural Information Processing Systems, 2025.

Markdown

[Diakonikolas et al. "Information-Computation Tradeoffs  for Noiseless Linear Regression with Oblivious Contamination." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/diakonikolas2025neurips-informationcomputation/)

BibTeX

@inproceedings{diakonikolas2025neurips-informationcomputation,
  title     = {{Information-Computation Tradeoffs  for Noiseless Linear Regression with Oblivious Contamination}},
  author    = {Diakonikolas, Ilias and Gao, Chao and Kane, Daniel and Lafferty, John and Pensia, Ankit},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/diakonikolas2025neurips-informationcomputation/}
}