On the Well-Spread Property and Its Relation to Linear Regression
Abstract
We consider the robust linear regression model $\bm{y} = X\beta^* + \bm{\eta}$, where an adversary oblivious to the design $X \in \R^{n \times d}$ may choose $\bm{\eta}$ to corrupt all but a (possibly vanishing) fraction of the observations $\bm{y}$ in an arbitrary way. Recent work \cite{d2021consistent, d2021consistentICML} has introduced efficient algorithms for consistent recovery of the parameter vector. These algorithms crucially rely on the design matrix being well-spread (a matrix is well-spread if its column span is far from any sparse vector). In this paper, we show that there exists a family of design matrices lacking well-spreadness such that consistent recovery of the parameter vector in the above robust linear regression model is information-theoretically impossible. We further investigate the average-case time complexity of certifying well-spreadness of random matrices. We show that it is possible to efficiently certify whether a given $n$-by-$d$ Gaussian matrix is well-spread if the number of observations is quadratic in the ambient dimension. We complement this result by showing rigorous evidence —in the form of a lower bound against low-degree polynomials— of the computational hardness of this same certification problem when the number of observations is $o(d^2)$.
Cite
Text
Chen and d’Orsi. "On the Well-Spread Property and Its Relation to Linear Regression." Conference on Learning Theory, 2022.Markdown
[Chen and d’Orsi. "On the Well-Spread Property and Its Relation to Linear Regression." Conference on Learning Theory, 2022.](https://mlanthology.org/colt/2022/chen2022colt-wellspread/)BibTeX
@inproceedings{chen2022colt-wellspread,
title = {{On the Well-Spread Property and Its Relation to Linear Regression}},
author = {Chen, Hongjie and d’Orsi, Tommaso},
booktitle = {Conference on Learning Theory},
year = {2022},
pages = {3905-3935},
volume = {178},
url = {https://mlanthology.org/colt/2022/chen2022colt-wellspread/}
}