A Layered Dirichlet Process for Hierarchical Segmentation of Sequential Grouped Data
Abstract
We address the problem of hierarchical segmentation of sequential grouped data, such as a collection of textual documents, and propose a Bayesian nonparametric approach for this problem. Existing Bayesian nonparametric models such as the sticky HDP-HMM are suitable only for single-layer segmentation. We propose the Layered Dirichlet Process (LaDP), where each layer has a countable set of Dirichlet Processes, draws from which define a distribution over the countable set of Dirichlet Processes at the next layer. Each data item gets assigned to a distribution (index) from each layer of the hierarchy, leading to hierarchical segmentation of the sequence. The complexity of inference depends upon the exchangeability assumptions for the measures at different layers. We propose a new notion of exchangeability called Block Exchangeability, which lies between Markov Exchangeability (used in HDP-HMM) and Complete Group Exchangeability (used in HDP), and allows for faster inference than Markov Exchangeability. Using experiments on a news transcript dataset and a product review dataset, we show that LaDP generalizes better than existing non-parametric models for sequential data, and by simultaneously segmenting at multiple levels, outperforms existing models in terms of single-layer segmentation. We also show empirically that using Block Exchangeability greatly speeds up inference and allows trading off accuracy for execution time.
Cite
Text
Mitra et al. "A Layered Dirichlet Process for Hierarchical Segmentation of Sequential Grouped Data." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2013. doi:10.1007/978-3-642-40991-2_30Markdown
[Mitra et al. "A Layered Dirichlet Process for Hierarchical Segmentation of Sequential Grouped Data." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2013.](https://mlanthology.org/ecmlpkdd/2013/mitra2013ecmlpkdd-layered/) doi:10.1007/978-3-642-40991-2_30BibTeX
@inproceedings{mitra2013ecmlpkdd-layered,
title = {{A Layered Dirichlet Process for Hierarchical Segmentation of Sequential Grouped Data}},
author = {Mitra, Adway and Ranganath, B. N. and Bhattacharya, Indrajit},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2013},
pages = {465-482},
doi = {10.1007/978-3-642-40991-2_30},
url = {https://mlanthology.org/ecmlpkdd/2013/mitra2013ecmlpkdd-layered/}
}