Model Selection for High-Dimensional Regression Under the Generalized Irrepresentability Condition

Abstract

In the high-dimensional regression model a response variable is linearly related to $p$ covariates, but the sample size $n$ is smaller than $p$. We assume that only a small subset of covariates is `active' (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates. A popular approach is to estimate the regression coefficients through the Lasso ($\ell_1$-regularized least squares). This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantified through the so called `irrepresentability' condition. In this paper we study the `Gauss-Lasso' selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. We formulate `generalized irrepresentability condition' (GIC), an assumption that is substantially weaker than irrepresentability. We prove that, under GIC, the Gauss-Lasso correctly recovers the active set.

Cite

Text

Javanmard and Montanari. "Model Selection for High-Dimensional Regression Under the Generalized Irrepresentability Condition." Neural Information Processing Systems, 2013.

Markdown

[Javanmard and Montanari. "Model Selection for High-Dimensional Regression Under the Generalized Irrepresentability Condition." Neural Information Processing Systems, 2013.](https://mlanthology.org/neurips/2013/javanmard2013neurips-model/)

BibTeX

@inproceedings{javanmard2013neurips-model,
  title     = {{Model Selection for High-Dimensional Regression Under the Generalized Irrepresentability Condition}},
  author    = {Javanmard, Adel and Montanari, Andrea},
  booktitle = {Neural Information Processing Systems},
  year      = {2013},
  pages     = {3012-3020},
  url       = {https://mlanthology.org/neurips/2013/javanmard2013neurips-model/}
}