Detecting Events in a Million New York Times Articles

Abstract

We present a demonstration of a newly developed text stream event detection method on over a million articles from the New York Times corpus. The event detection is designed to operate in a predominantly on-line fashion, reporting new events within a specified timeframe. The event detection is achieved by detecting significant changes in the statistical properties of the text where those properties are efficiently stored and updated in a suffix tree. This particular demonstration shows how our method is effective at discovering both short- and long-term events (which are often denoted topics), and how it automatically copes with topic drift on a corpus of 1 035 263 articles.

Cite

Text

Snowsill et al. "Detecting Events in a Million New York Times Articles." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2010. doi:10.1007/978-3-642-15939-8_46

Markdown

[Snowsill et al. "Detecting Events in a Million New York Times Articles." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2010.](https://mlanthology.org/ecmlpkdd/2010/snowsill2010ecmlpkdd-detecting/) doi:10.1007/978-3-642-15939-8_46

BibTeX

@inproceedings{snowsill2010ecmlpkdd-detecting,
  title     = {{Detecting Events in a Million New York Times Articles}},
  author    = {Snowsill, Tristan and Flaounas, Ilias N. and De Bie, Tijl and Cristianini, Nello},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2010},
  pages     = {615-618},
  doi       = {10.1007/978-3-642-15939-8_46},
  url       = {https://mlanthology.org/ecmlpkdd/2010/snowsill2010ecmlpkdd-detecting/}
}