Massively Distributed Clustering via Dirichlet Process Mixture

Abstract

Dirichlet Process Mixture (DPM) is a model used for multivariate clustering with the advantage of discovering the number of clusters automatically and offering favorable characteristics, but with prohibitive response times, which makes centralized DPM approaches inefficient. We propose a demonstration of two parallel clustering solutions : i) DC-DPM that gracefully scales to millions of data points while remaining DPM compliant, which is the challenge of distributing this process, ii) HD4C that addresses the curse of dimensionality by performing a distributed DPM clustering of high dimensional data such as time series or hyperspectral data.

Cite

Text

Meguelati et al. "Massively Distributed Clustering via Dirichlet Process Mixture." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2020. doi:10.1007/978-3-030-67670-4_34

Markdown

[Meguelati et al. "Massively Distributed Clustering via Dirichlet Process Mixture." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2020.](https://mlanthology.org/ecmlpkdd/2020/meguelati2020ecmlpkdd-massively/) doi:10.1007/978-3-030-67670-4_34

BibTeX

@inproceedings{meguelati2020ecmlpkdd-massively,
  title     = {{Massively Distributed Clustering via Dirichlet Process Mixture}},
  author    = {Meguelati, Khadidja and Fontez, Benedicte and Hilgert, Nadine and Masseglia, Florent and Sanchez, Isabelle},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2020},
  pages     = {536-540},
  doi       = {10.1007/978-3-030-67670-4_34},
  url       = {https://mlanthology.org/ecmlpkdd/2020/meguelati2020ecmlpkdd-massively/}
}