Semi-Supervised Prediction-Constrained Topic Models

Abstract

Supervisory signals can help topic models discover low-dimensional data representations which are useful for a specific prediction task. We propose a framework for training supervised latent Dirichlet allocation that balances two goals: faithful generative explanations of high-dimensional data and accurate prediction of associated class labels. Existing approaches fail to balance these goals by not properly handling a fundamental asymmetry: the intended application is always predicting labels from data, not data from labels. Our new prediction-constrained objective for training generative models coherently integrates supervisory signals even when only a small fraction of training examples are labeled. We demonstrate improved prediction quality compared to previous supervised topic models, achieving results competitive with high-dimensional logistic regression on text analysis and electronic health records tasks while simultaneously learning interpretable topics.

Cite

Text

Hughes et al. "Semi-Supervised Prediction-Constrained Topic Models." International Conference on Artificial Intelligence and Statistics, 2018.

Markdown

[Hughes et al. "Semi-Supervised Prediction-Constrained Topic Models." International Conference on Artificial Intelligence and Statistics, 2018.](https://mlanthology.org/aistats/2018/hughes2018aistats-semi/)

BibTeX

@inproceedings{hughes2018aistats-semi,
  title     = {{Semi-Supervised Prediction-Constrained Topic Models}},
  author    = {Hughes, Michael C. and Hope, Gabriel and Weiner, Leah and Jr., Thomas H. McCoy and Perlis, Roy H. and Sudderth, Erik B. and Doshi-Velez, Finale},
  booktitle = {International Conference on Artificial Intelligence and Statistics},
  year      = {2018},
  pages     = {1067-1076},
  url       = {https://mlanthology.org/aistats/2018/hughes2018aistats-semi/}
}