A Scalable Framework for Discovering Coherent Co-Clusters in Noisy Data

Abstract

Clustering problems often involve datasets where only a part of the data is relevant to the problem, e.g., in microarray data analysis only a subset of the genes show cohesive expressions within a subset of the conditions/features. The existence of a large number of non-informative data points and features makes it challenging to hunt for coherent and meaningful clusters from such datasets. Additionally, since clusters could exist in different subspaces of the feature space, a co-clustering algorithm that simultaneously clusters objects and features is often more suitable as compared to one that is restricted to traditional ``one-sided'' clustering. We propose Robust Overlapping Co-clustering (ROCC), a scalable and very versatile framework that addresses the problem of efficiently mining dense, arbitrarily positioned, possibly overlapping co-clusters from large, noisy datasets. ROCC has several desirable properties that make it extremely well suited to a number of real life applications.

Cite

Text

Deodhar et al. "A Scalable Framework for Discovering Coherent Co-Clusters in Noisy Data." International Conference on Machine Learning, 2009. doi:10.1145/1553374.1553405

Markdown

[Deodhar et al. "A Scalable Framework for Discovering Coherent Co-Clusters in Noisy Data." International Conference on Machine Learning, 2009.](https://mlanthology.org/icml/2009/deodhar2009icml-scalable/) doi:10.1145/1553374.1553405

BibTeX

@inproceedings{deodhar2009icml-scalable,
  title     = {{A Scalable Framework for Discovering Coherent Co-Clusters in Noisy Data}},
  author    = {Deodhar, Meghana and Gupta, Gunjan and Ghosh, Joydeep and Cho, Hyuk and Dhillon, Inderjit S.},
  booktitle = {International Conference on Machine Learning},
  year      = {2009},
  pages     = {241-248},
  doi       = {10.1145/1553374.1553405},
  url       = {https://mlanthology.org/icml/2009/deodhar2009icml-scalable/}
}