Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora
Abstract
We propose Dirichlet-Bernoulli Alignment (DBA), a generative model for corpora in which each pattern (e.g., a document) contains a set of instances (e.g., paragraphs in the document) and belongs to multiple classes. By casting predefined classes as latent Dirichlet variables (i.e., instance level labels), and modeling the multi-label of each pattern as Bernoulli variables conditioned on the weighted empirical average of topic assignments, DBA automatically aligns the latent topics discovered from data to human-defined classes. DBA is useful for both pattern classification and instance disambiguation, which are tested on text classification and named entity disambiguation for web search queries respectively.
Cite
Text
Yang et al. "Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora." Neural Information Processing Systems, 2009.Markdown
[Yang et al. "Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora." Neural Information Processing Systems, 2009.](https://mlanthology.org/neurips/2009/yang2009neurips-dirichletbernoulli/)BibTeX
@inproceedings{yang2009neurips-dirichletbernoulli,
title = {{Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora}},
author = {Yang, Shuang-hong and Zha, Hongyuan and Hu, Bao-gang},
booktitle = {Neural Information Processing Systems},
year = {2009},
pages = {2143-2150},
url = {https://mlanthology.org/neurips/2009/yang2009neurips-dirichletbernoulli/}
}