Fully Distributed EM for Very Large Datasets

Abstract

In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of the parameters in a single node for the M-step can be impractical. We present a framework that fully distributes the entire EM procedure. Each node interacts only with parameters relevant to its data, sending messages to other nodes along a junction-tree topology. We demonstrate improvements over a MapReduce topology, on two tasks: word alignment and topic modeling.

Cite

Text

Wolfe et al. "Fully Distributed EM for Very Large Datasets." International Conference on Machine Learning, 2008. doi:10.1145/1390156.1390305

Markdown

[Wolfe et al. "Fully Distributed EM for Very Large Datasets." International Conference on Machine Learning, 2008.](https://mlanthology.org/icml/2008/wolfe2008icml-fully/) doi:10.1145/1390156.1390305

BibTeX

@inproceedings{wolfe2008icml-fully,
  title     = {{Fully Distributed EM for Very Large Datasets}},
  author    = {Wolfe, Jason Andrew and Haghighi, Aria and Klein, Dan},
  booktitle = {International Conference on Machine Learning},
  year      = {2008},
  pages     = {1184-1191},
  doi       = {10.1145/1390156.1390305},
  url       = {https://mlanthology.org/icml/2008/wolfe2008icml-fully/}
}