Fully Distributed EM for Very Large Datasets
Abstract
In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of the parameters in a single node for the M-step can be impractical. We present a framework that fully distributes the entire EM procedure. Each node interacts only with parameters relevant to its data, sending messages to other nodes along a junction-tree topology. We demonstrate improvements over a MapReduce topology, on two tasks: word alignment and topic modeling.
Cite
Text
Wolfe et al. "Fully Distributed EM for Very Large Datasets." International Conference on Machine Learning, 2008. doi:10.1145/1390156.1390305Markdown
[Wolfe et al. "Fully Distributed EM for Very Large Datasets." International Conference on Machine Learning, 2008.](https://mlanthology.org/icml/2008/wolfe2008icml-fully/) doi:10.1145/1390156.1390305BibTeX
@inproceedings{wolfe2008icml-fully,
title = {{Fully Distributed EM for Very Large Datasets}},
author = {Wolfe, Jason Andrew and Haghighi, Aria and Klein, Dan},
booktitle = {International Conference on Machine Learning},
year = {2008},
pages = {1184-1191},
doi = {10.1145/1390156.1390305},
url = {https://mlanthology.org/icml/2008/wolfe2008icml-fully/}
}