Powered Dirichlet Process - Controlling the "Rich-Get-Richer" Assumption in Bayesian Clustering

Abstract

The Dirichlet process is one of the most widely used priors in Bayesian clustering. This process allows for a nonparametric estimation of the number of clusters when partitioning datasets. The “rich-get-richer” property is a key feature of this process, and transcribes that the a priori probability for a cluster to get selected dependent linearly on its population. In this paper, we show that such hypothesis is not necessarily optimal. We derive the Powered Dirichlet Process as a generalization of the Dirichlet-Multinomial distribution as an answer to this problem. We then derive some of its fundamental properties (expected number of clusters, convergence). Unlike state-of-the-art efforts in this direction, this new formulation allows for direct control of the importance of the “rich-get-richer” prior. We confront our proposition to several simulated and real-world datasets, and confirm that our formulation allows for significantly better results in both cases.

Cite

Text

Poux-Médard et al. "Powered Dirichlet Process - Controlling the "Rich-Get-Richer" Assumption in Bayesian Clustering." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43412-9_36

Markdown

[Poux-Médard et al. "Powered Dirichlet Process - Controlling the "Rich-Get-Richer" Assumption in Bayesian Clustering." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/pouxmedard2023ecmlpkdd-powered/) doi:10.1007/978-3-031-43412-9_36

BibTeX

@inproceedings{pouxmedard2023ecmlpkdd-powered,
  title     = {{Powered Dirichlet Process - Controlling the "Rich-Get-Richer" Assumption in Bayesian Clustering}},
  author    = {Poux-Médard, Gaël and Velcin, Julien and Loudcher, Sabine},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2023},
  pages     = {611-626},
  doi       = {10.1007/978-3-031-43412-9_36},
  url       = {https://mlanthology.org/ecmlpkdd/2023/pouxmedard2023ecmlpkdd-powered/}
}