The Information-Theoretic Requirements of Subspace Clustering with Missing Data
Abstract
Subspace clustering with missing data (SCMD) is a useful tool for analyzing incomplete datasets. Let d be the ambient dimension, and r the dimension of the subspaces. Existing theory shows that Nk = O(r d) columns per subspace are necessary for SCMD, and Nk =O(min d^(log d), d^(r+1) ) are sufficient. We close this gap, showing that Nk =O(r d) is also sufficient. To do this we derive deterministic sampling conditions for SCMD, which give precise information theoretic requirements and determine sampling regimes. These results explain the performance of SCMD algorithms from the literature. Finally, we give a practical algorithm to certify the output of any SCMD method deterministically.
Cite
Text
Pimentel-Alarcon and Nowak. "The Information-Theoretic Requirements of Subspace Clustering with Missing Data." International Conference on Machine Learning, 2016.Markdown
[Pimentel-Alarcon and Nowak. "The Information-Theoretic Requirements of Subspace Clustering with Missing Data." International Conference on Machine Learning, 2016.](https://mlanthology.org/icml/2016/pimentelalarcon2016icml-informationtheoretic/)BibTeX
@inproceedings{pimentelalarcon2016icml-informationtheoretic,
title = {{The Information-Theoretic Requirements of Subspace Clustering with Missing Data}},
author = {Pimentel-Alarcon, Daniel and Nowak, Robert},
booktitle = {International Conference on Machine Learning},
year = {2016},
pages = {802-810},
volume = {48},
url = {https://mlanthology.org/icml/2016/pimentelalarcon2016icml-informationtheoretic/}
}