Streaming Pointwise Mutual Information

Abstract

Recent work has led to the ability to perform space efficient, approximate counting over large vocabularies in a streaming context. Motivated by the existence of data structures of this type, we explore the computation of associativity scores, other- wise known as pointwise mutual information (PMI), in a streaming context. We give theoretical bounds showing the impracticality of perfect online PMI compu- tation, and detail an algorithm with high expected accuracy. Experiments on news articles show our approach gives high accuracy on real world data.

Cite

Text

Durme and Lall. "Streaming Pointwise Mutual Information." Neural Information Processing Systems, 2009.

Markdown

[Durme and Lall. "Streaming Pointwise Mutual Information." Neural Information Processing Systems, 2009.](https://mlanthology.org/neurips/2009/durme2009neurips-streaming/)

BibTeX

@inproceedings{durme2009neurips-streaming,
  title     = {{Streaming Pointwise Mutual Information}},
  author    = {Durme, Benjamin V. and Lall, Ashwin},
  booktitle = {Neural Information Processing Systems},
  year      = {2009},
  pages     = {1892-1900},
  url       = {https://mlanthology.org/neurips/2009/durme2009neurips-streaming/}
}