Global-Local Dirichlet Processes for Clustering Grouped Data in the Presence of Group-Specific Idiosyncratic Variables
Abstract
We consider the problem of clustering grouped data for which the observations may include group-specific variables in addition to the variables that are shared across groups. This type of data is quite common; for example, in cancer genomic studies, molecular information is available for all cancers whereas cancer-specific clinical information may only be available for certain cancers. Existing grouped clustering methods only consider the shared variables but ignore valuable information from the group-specific variables. To allow for these group-specific variables to aid in the clustering, we propose a novel Bayesian nonparametric approach, termed global-local (GLocal) Dirichlet process, that models the "global-local" structure of the observations across groups. We characterize the GLocal Dirichlet process using the stick-breaking representation and the representation as a limit of a finite mixture model. We theoretically quantify the approximation errors of the truncated prior, the corresponding finite mixture model, and the associated posterior distribution. We develop a fast variational Bayes algorithm for scalable posterior inference, which we illustrate with extensive simulations and a TCGA pan-gastrointestinal cancer dataset.
Cite
Text
Chakrabarti et al. "Global-Local Dirichlet Processes for Clustering Grouped Data in the Presence of Group-Specific Idiosyncratic Variables." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Chakrabarti et al. "Global-Local Dirichlet Processes for Clustering Grouped Data in the Presence of Group-Specific Idiosyncratic Variables." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/chakrabarti2025icml-globallocal/)BibTeX
@inproceedings{chakrabarti2025icml-globallocal,
title = {{Global-Local Dirichlet Processes for Clustering Grouped Data in the Presence of Group-Specific Idiosyncratic Variables}},
author = {Chakrabarti, Arhit and Ni, Yang and Pati, Debdeep and Mallick, Bani},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {7214-7249},
volume = {267},
url = {https://mlanthology.org/icml/2025/chakrabarti2025icml-globallocal/}
}