Central Clustering of Categorical Data with Automated Feature Weighting

Abstract

The ability to cluster high-dimensional categorical data is essential for many machine learning applications such as bioinfomatics. Currently, central clustering of categorical data is a difficult problem due to the lack of a geometrically interpretable definition of a cluster center. In this paper, we propose a novel kernel-density-based definition using a Bayes-type probability estimator. Then, a new algorithm called k-centers is proposed for central clustering of categorical data, incorporating a new feature weighting scheme by which each attribute is automatically assigned with a weight measuring its individual contribution for the clusters. Experimental results on real-world data show outstanding performance of the proposed algorithm, especially in recognizing the biological patterns in DNA sequences.

Cite

Text

Chen and Wang. "Central Clustering of Categorical Data with Automated Feature Weighting." International Joint Conference on Artificial Intelligence, 2013.

Markdown

[Chen and Wang. "Central Clustering of Categorical Data with Automated Feature Weighting." International Joint Conference on Artificial Intelligence, 2013.](https://mlanthology.org/ijcai/2013/chen2013ijcai-central/)

BibTeX

@inproceedings{chen2013ijcai-central,
  title     = {{Central Clustering of Categorical Data with Automated Feature Weighting}},
  author    = {Chen, Lifei and Wang, Shengrui},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2013},
  pages     = {1260-1266},
  url       = {https://mlanthology.org/ijcai/2013/chen2013ijcai-central/}
}