Towards a Zero-One Law for Column Subset Selection

Abstract

There are a number of approximation algorithms for NP-hard versions of low rank approximation, such as finding a rank-$k$ matrix $B$ minimizing the sum of absolute values of differences to a given $n$-by-$n$ matrix $A$, $\min_{\textrm{rank-}k~B}\|A-B\|_1$, or more generally finding a rank-$k$ matrix $B$ which minimizes the sum of $p$-th powers of absolute values of differences, $\min_{\textrm{rank-}k~B}\|A-B\|_p^p$. Many of these algorithms are linear time columns subset selection algorithms, returning a subset of $\poly(k \log n)$ columns whose cost is no more than a $\poly(k)$ factor larger than the cost of the best rank-$k$ matrix. The above error measures are special cases of the following general entrywise low rank approximation problem: given an arbitrary function $g:\mathbb{R} \rightarrow \mathbb{R}_{\geq 0}$, find a rank-$k$ matrix $B$ which minimizes $\|A-B\|_g = \sum_{i,j}g(A_{i,j}-B_{i,j})$. A natural question is which functions $g$ admit efficient approximation algorithms? Indeed, this is a central question of recent work studying generalized low rank models. In this work we give approximation algorithms for {\it every} function $g$ which is approximately monotone and satisfies an approximate triangle inequality, and we show both of these conditions are necessary. Further, our algorithm is efficient if the function $g$ admits an efficient approximate regression algorithm. Our approximation algorithms handle functions which are not even scale-invariant, such as the Huber loss function, which we show have very different structural properties than $\ell_p$-norms, e.g., one can show the lack of scale-invariance causes any column subset selection algorithm to provably require a $\sqrt{\log n}$ factor larger number of columns than $\ell_p$-norms; nevertheless we design the first efficient column subset selection algorithms for such error measures.

Cite

Text

Song et al. "Towards a Zero-One Law for Column Subset Selection." Neural Information Processing Systems, 2019.

Markdown

[Song et al. "Towards a Zero-One Law for Column Subset Selection." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/song2019neurips-zeroone/)

BibTeX

@inproceedings{song2019neurips-zeroone,
  title     = {{Towards a Zero-One Law for Column Subset Selection}},
  author    = {Song, Zhao and Woodruff, David and Zhong, Peilin},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {6123-6134},
  url       = {https://mlanthology.org/neurips/2019/song2019neurips-zeroone/}
}