Efficient Distributed Topic Modeling with Provable Guarantees
Abstract
Topic modeling for large-scale distributed web-collections requires distributed techniques that account for both computational and communication costs. We consider topic modeling under the separability assumption and develop novel computationally efficient methods that provably achieve the statistical performance of the state-of-the-art centralized approaches while requiring insignificant communication between the distributed document collections. We achieve tradeoffs between communication and computation without actually transmitting the documents. Our scheme is based on exploiting the geometry of normalized word-word co-occurrence matrix and viewing each row of this matrix as a vector in a high-dimensional space. We relate the solid angle subtended by extreme points of the convex hull of these vectors to topic identities and construct distributed schemes to identify topics.
Cite
Text
Ding et al. "Efficient Distributed Topic Modeling with Provable Guarantees." International Conference on Artificial Intelligence and Statistics, 2014.Markdown
[Ding et al. "Efficient Distributed Topic Modeling with Provable Guarantees." International Conference on Artificial Intelligence and Statistics, 2014.](https://mlanthology.org/aistats/2014/ding2014aistats-efficient/)BibTeX
@inproceedings{ding2014aistats-efficient,
title = {{Efficient Distributed Topic Modeling with Provable Guarantees}},
author = {Ding, Weicong and Rohban, Mohammad H. and Ishwar, Prakash and Saligrama, Venkatesh},
booktitle = {International Conference on Artificial Intelligence and Statistics},
year = {2014},
pages = {167-175},
url = {https://mlanthology.org/aistats/2014/ding2014aistats-efficient/}
}