Minimax Localization of Structural Information in Large Noisy Matrices

Abstract

We consider the problem of identifying a sparse set of relevant columns and rows in a large data matrix with highly corrupted entries. This problem of identifying groups from a collection of bipartite variables such as proteins and drugs, biological species and gene sequences, malware and signatures, etc is commonly referred to as biclustering or co-clustering. Despite its great practical relevance, and although several ad-hoc methods are available for biclustering, theoretical analysis of the problem is largely non-existent. The problem we consider is also closely related to structured multiple hypothesis testing, an area of statistics that has recently witnessed a flurry of activity. We make the following contributions: i) We prove lower bounds on the minimum signal strength needed for successful recovery of a bicluster as a function of the noise variance, size of the matrix and bicluster of interest. ii) We show that a combinatorial procedure based on the scan statistic achieves this optimal limit. iii) We characterize the SNR required by several computationally tractable procedures for biclustering including element-wise thresholding, column/row average thresholding and a convex relaxation approach to sparse singular vector decomposition.

Cite

Text

Kolar et al. "Minimax Localization of Structural Information in Large Noisy Matrices." Neural Information Processing Systems, 2011.

Markdown

[Kolar et al. "Minimax Localization of Structural Information in Large Noisy Matrices." Neural Information Processing Systems, 2011.](https://mlanthology.org/neurips/2011/kolar2011neurips-minimax/)

BibTeX

@inproceedings{kolar2011neurips-minimax,
  title     = {{Minimax Localization of Structural Information in Large Noisy Matrices}},
  author    = {Kolar, Mladen and Balakrishnan, Sivaraman and Rinaldo, Alessandro and Singh, Aarti},
  booktitle = {Neural Information Processing Systems},
  year      = {2011},
  pages     = {909-917},
  url       = {https://mlanthology.org/neurips/2011/kolar2011neurips-minimax/}
}