Tackling Noise in Active Semi-Supervised Clustering
Abstract
Constraint-based clustering leverages user-provided constraints to produce a clustering that matches the user’s expectation. In active constraint-based clustering, the algorithm selects the most informative constraints to query in order to produce good clusterings with as few constraints as possible. A major challenge in constraint-based clustering is handling noise: the majority of existing approaches assume that the provided constraints are correct, while that might not be the case. In this paper, we propose a method to identify and correct noisy constraints in active constraint-based clustering. Our approach reasons probabilistically about the correctness of the user’s answers and asks additional constraints to corroborate or correct the suspicious answers. We demonstrate the method’s effectiveness by incorporating it into COBRAS, a state-of-the-art method for active constraint-based clustering. Compared to COBRAS and other active-constraint-based clustering algorithms, the resulting system produces better clusterings in the presence of noise.
Cite
Text
Soenen et al. "Tackling Noise in Active Semi-Supervised Clustering." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2020. doi:10.1007/978-3-030-67661-2_8Markdown
[Soenen et al. "Tackling Noise in Active Semi-Supervised Clustering." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2020.](https://mlanthology.org/ecmlpkdd/2020/soenen2020ecmlpkdd-tackling/) doi:10.1007/978-3-030-67661-2_8BibTeX
@inproceedings{soenen2020ecmlpkdd-tackling,
title = {{Tackling Noise in Active Semi-Supervised Clustering}},
author = {Soenen, Jonas and Dumancic, Sebastijan and van Craenendonck, Toon and Blockeel, Hendrik},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2020},
pages = {121-136},
doi = {10.1007/978-3-030-67661-2_8},
url = {https://mlanthology.org/ecmlpkdd/2020/soenen2020ecmlpkdd-tackling/}
}