Detecting Events in a Million New York Times Articles
Abstract
We present a demonstration of a newly developed text stream event detection method on over a million articles from the New York Times corpus. The event detection is designed to operate in a predominantly on-line fashion, reporting new events within a specified timeframe. The event detection is achieved by detecting significant changes in the statistical properties of the text where those properties are efficiently stored and updated in a suffix tree. This particular demonstration shows how our method is effective at discovering both short- and long-term events (which are often denoted topics), and how it automatically copes with topic drift on a corpus of 1 035 263 articles.
Cite
Text
Snowsill et al. "Detecting Events in a Million New York Times Articles." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2010. doi:10.1007/978-3-642-15939-8_46Markdown
[Snowsill et al. "Detecting Events in a Million New York Times Articles." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2010.](https://mlanthology.org/ecmlpkdd/2010/snowsill2010ecmlpkdd-detecting/) doi:10.1007/978-3-642-15939-8_46BibTeX
@inproceedings{snowsill2010ecmlpkdd-detecting,
title = {{Detecting Events in a Million New York Times Articles}},
author = {Snowsill, Tristan and Flaounas, Ilias N. and De Bie, Tijl and Cristianini, Nello},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2010},
pages = {615-618},
doi = {10.1007/978-3-642-15939-8_46},
url = {https://mlanthology.org/ecmlpkdd/2010/snowsill2010ecmlpkdd-detecting/}
}