The Security of Latent Dirichlet Allocation

Abstract

Latent Dirichlet allocation (LDA) is an increasingly popular tool for data analysis in many domains. If LDA output affects decision making (especially when money is involved), there is an incentive for attackers to compromise it. We ask the question: how can an attacker minimally poison the corpus so that LDA produces topics that the attacker wants the LDA user to see? Answering this question is important to characterize such attacks, and to develop defenses in the future. We give a novel bilevel optimization formulation to identify the optimal poisoning attack. We present an efficient solution (up to local optima) using descent method and implicit functions. We demonstrate poisoning attacks on LDA with extensive experiments, and discuss possible defenses.

PDF AISTATS Semantic Scholar

Cite

Text

Mei and Zhu. "The Security of Latent Dirichlet Allocation." International Conference on Artificial Intelligence and Statistics, 2015.

Markdown

[Mei and Zhu. "The Security of Latent Dirichlet Allocation." International Conference on Artificial Intelligence and Statistics, 2015.](https://mlanthology.org/aistats/2015/mei2015aistats-security/)

BibTeX

@inproceedings{mei2015aistats-security,
  title     = {{The Security of Latent Dirichlet Allocation}},
  author    = {Mei, Shike and Zhu, Xiaojin},
  booktitle = {International Conference on Artificial Intelligence and Statistics},
  year      = {2015},
  url       = {https://mlanthology.org/aistats/2015/mei2015aistats-security/}
}