Clustering High Dimensional Categorical Data via Topographical Features

Abstract

Analysis of categorical data is a challenging task. In this paper, we propose to compute topographical features of high-dimensional categorical data. We propose an efficient algorithm to extract modes of the underlying distribution and their attractive basins. These topographical features provide a geometric view of the data and can be applied to visualization and clustering of real world challenging datasets. Experiments show that our principled method outperforms state-of-the-art clustering methods while also admits an embarrassingly parallel property.

Cite

Text

Chen and Quadrianto. "Clustering High Dimensional Categorical Data via Topographical Features." International Conference on Machine Learning, 2016.

Markdown

[Chen and Quadrianto. "Clustering High Dimensional Categorical Data via Topographical Features." International Conference on Machine Learning, 2016.](https://mlanthology.org/icml/2016/chen2016icml-clustering/)

BibTeX

@inproceedings{chen2016icml-clustering,
  title     = {{Clustering High Dimensional Categorical Data via Topographical Features}},
  author    = {Chen, Chao and Quadrianto, Novi},
  booktitle = {International Conference on Machine Learning},
  year      = {2016},
  pages     = {2732-2740},
  volume    = {48},
  url       = {https://mlanthology.org/icml/2016/chen2016icml-clustering/}
}