Clustering by Intent: A Semi-Supervised Method to Discover Relevant Clusters Incrementally

Abstract

Our business users have often been frustrated with clustering results that do not suit their purpose; when trying to discover clusters of product complaints, the algorithm may return clusters of product models instead. The fundamental issue is that complex text data can be clustered in many different ways, and, really, it is optimistic to expect relevant clusters from an unsupervised process, even with parameter tinkering. We studied this problem in an interactive context and developed an effective solution that re-casts the problem formulation, radically different from traditional or semi-supervised clustering. Given training labels of some known classes, our method incrementally proposes complementary clusters. In tests on various business datasets, we consistently get relevant results and at interactive time scales. This paper describes the method and demonstrates its superior ability using publicly available datasets. For automated evaluation, we devised a unique cluster evaluation framework to match the business user’s utility.

Cite

Text

Forman et al. "Clustering by Intent: A Semi-Supervised Method to Discover Relevant Clusters Incrementally." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2015. doi:10.1007/978-3-319-23461-8_2

Markdown

[Forman et al. "Clustering by Intent: A Semi-Supervised Method to Discover Relevant Clusters Incrementally." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2015.](https://mlanthology.org/ecmlpkdd/2015/forman2015ecmlpkdd-clustering/) doi:10.1007/978-3-319-23461-8_2

BibTeX

@inproceedings{forman2015ecmlpkdd-clustering,
  title     = {{Clustering by Intent: A Semi-Supervised Method to Discover Relevant Clusters Incrementally}},
  author    = {Forman, George and Nachlieli, Hila and Keshet, Renato},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2015},
  pages     = {20-36},
  doi       = {10.1007/978-3-319-23461-8_2},
  url       = {https://mlanthology.org/ecmlpkdd/2015/forman2015ecmlpkdd-clustering/}
}