A Cluster-Level Semi-Supervision Model for Interactive Clustering

Abstract

Semi-supervised clustering models, that incorporate user provided constraints to yield meaningful clusters, have recently become a popular area of research. In this paper, we propose a cluster-level semi-supervision model for inter-active clustering. Prototype based clustering algorithms typically alternate between updating cluster descriptions and assignment of data items to clusters. In our model, the user provides semi-supervision directly for these two steps. Assignment feedback re-assigns data items among existing clusters, while cluster description feedback helps to position existing cluster centers more meaningfully. We argue that providing such supervision is more natural for exploratory data mining, where the user discovers and interprets clusters as the algorithm progresses, in comparison to the pair-wise instance level supervision model, particularly for high dimensional data such as document collection. We show how such feedback can be interpreted as constraints and incorporated within the kmeans clustering framework. Using experimental results on multiple real-world datasets, we show that this framework improves clustering performance significantly beyond traditional k-means. Interestingly, when given the same number of feedbacks from the user, the proposed framework significantly outperforms the pair-wise supervision model.

Cite

Text

Dubey et al. "A Cluster-Level Semi-Supervision Model for Interactive Clustering." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2010. doi:10.1007/978-3-642-15880-3_32

Markdown

[Dubey et al. "A Cluster-Level Semi-Supervision Model for Interactive Clustering." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2010.](https://mlanthology.org/ecmlpkdd/2010/dubey2010ecmlpkdd-clusterlevel/) doi:10.1007/978-3-642-15880-3_32

BibTeX

@inproceedings{dubey2010ecmlpkdd-clusterlevel,
  title     = {{A Cluster-Level Semi-Supervision Model for Interactive Clustering}},
  author    = {Dubey, Avinava and Bhattacharya, Indrajit and Godbole, Shantanu},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2010},
  pages     = {409-424},
  doi       = {10.1007/978-3-642-15880-3_32},
  url       = {https://mlanthology.org/ecmlpkdd/2010/dubey2010ecmlpkdd-clusterlevel/}
}