Discrete Model-Based Clustering with Overlapping Subsets of Attributes

Abstract

Traditional model-based clustering methods assume that data instances can be grouped in a single “best" way. This is often untrue for complex data, where several meaningful sets of clusters may exist, each of them associated to a unique subset of data attributes. Current literature has approached this problem with models that consider disjoint subsets of attributes to define distinct clustering solutions. Each solution being represented by a cluster variable. However, restricting attributes to a single cluster variable diminishes the expressiveness and quality of these models. For this reason, we propose a novel kind of models that allows cluster variables to have overlapping subsets of attributes between them. In order to learn these models, we propose to combine a search-based method with an attribute clustering procedure. Experimental results with both synthetic and real-world data show the utility of our approach and its competitiveness with the state-of-the-art.

Cite

Text

Rodriguez-Sanchez et al. "Discrete Model-Based Clustering with Overlapping Subsets of Attributes." Proceedings of the Ninth International Conference on Probabilistic Graphical Models, 2018.

Markdown

[Rodriguez-Sanchez et al. "Discrete Model-Based Clustering with Overlapping Subsets of Attributes." Proceedings of the Ninth International Conference on Probabilistic Graphical Models, 2018.](https://mlanthology.org/pgm/2018/rodriguezsanchez2018pgm-discrete/)

BibTeX

@inproceedings{rodriguezsanchez2018pgm-discrete,
  title     = {{Discrete Model-Based Clustering with Overlapping Subsets of Attributes}},
  author    = {Rodriguez-Sanchez, Fernando and Larrañaga, Pedro and Bielza, Concha},
  booktitle = {Proceedings of the Ninth International Conference on Probabilistic Graphical Models},
  year      = {2018},
  pages     = {392-403},
  volume    = {72},
  url       = {https://mlanthology.org/pgm/2018/rodriguezsanchez2018pgm-discrete/}
}