Content Modeling Using Latent Permutations

Abstract

We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be effectively represented using a distribution over permutations called the Generalized Mallows Model. We apply our method to three complementary discourse-level tasks: cross-document alignment, document segmentation, and information ordering. Our experiments show that incorporating our permutation-based model in these applications yields substantial improvements in performance over previously proposed methods.

Cite

Text

Chen et al. "Content Modeling Using Latent Permutations." Journal of Artificial Intelligence Research, 2009. doi:10.1613/JAIR.2830

Markdown

[Chen et al. "Content Modeling Using Latent Permutations." Journal of Artificial Intelligence Research, 2009.](https://mlanthology.org/jair/2009/chen2009jair-content/) doi:10.1613/JAIR.2830

BibTeX

@article{chen2009jair-content,
  title     = {{Content Modeling Using Latent Permutations}},
  author    = {Chen, Harr and Branavan, S. R. K. and Barzilay, Regina and Karger, David R.},
  journal   = {Journal of Artificial Intelligence Research},
  year      = {2009},
  pages     = {129-163},
  doi       = {10.1613/JAIR.2830},
  volume    = {36},
  url       = {https://mlanthology.org/jair/2009/chen2009jair-content/}
}