Predicting the Demographics of Twitter Users from Website Traffic Data
Abstract
Understanding the demographics of users of online social networks has important applications for health, marketing, and public messaging. In this paper, we predict the demographics of Twitter users based on whom they follow. Whereas most prior approaches rely on a supervised learning approach, in which individual users are labeled with demographics, we instead create a distantly labeled dataset by collecting audience measurement data for 1,500 websites (e.g., 50% of visitors to gizmodo.com are estimated to have a bachelor's degree). We then fit a regression model to predict these demographics using information about the followers of each website on Twitter. The resulting average held-out correlation is .77 across six different variables (gender, age, ethnicity, education, income, and child status). We additionally validate the model on a smaller set of Twitter users labeled individually for ethnicity and gender, finding performance that is surprisingly competitive with a fully supervised approach.
Cite
Text
Culotta et al. "Predicting the Demographics of Twitter Users from Website Traffic Data." AAAI Conference on Artificial Intelligence, 2015. doi:10.1609/AAAI.V29I1.9204Markdown
[Culotta et al. "Predicting the Demographics of Twitter Users from Website Traffic Data." AAAI Conference on Artificial Intelligence, 2015.](https://mlanthology.org/aaai/2015/culotta2015aaai-predicting/) doi:10.1609/AAAI.V29I1.9204BibTeX
@inproceedings{culotta2015aaai-predicting,
title = {{Predicting the Demographics of Twitter Users from Website Traffic Data}},
author = {Culotta, Aron and Kumar, Nirmal Ravi and Cutler, Jennifer},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2015},
pages = {72-78},
doi = {10.1609/AAAI.V29I1.9204},
url = {https://mlanthology.org/aaai/2015/culotta2015aaai-predicting/}
}