Kernel Density Estimation for Text-Based Geolocation

Abstract

Text-based geolocation classifiers often operate with a grid-based view of the world. Predicting document location of origin based on text content on a geodesic grid is computationally attractive since many standard methods for supervised document classification carry over unchanged to geolocation in the form of predicting a most probable grid cell for a document. However, the grid-based approach suffers from sparse data problems if one wants to improve classification accuracy by moving to smaller cell sizes. In this paper we investigate an enhancement of common methods for determining the geographic point of origin of a text document by kernel density estimation. For geolocation of tweets we obtain a improvements upon non-kernel methods on datasets of U.S. and global Twitter content.

Cite

Text

Hulden et al. "Kernel Density Estimation for Text-Based Geolocation." AAAI Conference on Artificial Intelligence, 2015. doi:10.1609/AAAI.V29I1.9149

Markdown

[Hulden et al. "Kernel Density Estimation for Text-Based Geolocation." AAAI Conference on Artificial Intelligence, 2015.](https://mlanthology.org/aaai/2015/hulden2015aaai-kernel/) doi:10.1609/AAAI.V29I1.9149

BibTeX

@inproceedings{hulden2015aaai-kernel,
  title     = {{Kernel Density Estimation for Text-Based Geolocation}},
  author    = {Hulden, Mans and Silfverberg, Miikka and Francom, Jerid},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2015},
  pages     = {145-150},
  doi       = {10.1609/AAAI.V29I1.9149},
  url       = {https://mlanthology.org/aaai/2015/hulden2015aaai-kernel/}
}