The Nine Lives of ImageNet: A Sociotechnical Retrospective of a Foundation Dataset and the Limits of Automated Essentialism

Abstract

ImageNet is the most cited and well-known dataset for training image classification models. The people categories of its original version from 2009 have been found to be highly problematic (e.g. Crawford and Paglen (2019); Prabhu and Birhane (2020)) and have since been updated to improve their representativity (Yang et al., 2020). In this paper, we examine the past and present versions of the dataset from a variety of quantitative and qualitative angles and note several technical, epistemological and institutional issues, including duplicates, erroneous images, dehumanizing content, and lack of consent. We also discuss the concepts of ‘safety’ and ‘imageability’, which were established as criteria for filtering the people categories of the most recent version of ImageNet 21K. We conclude with a discussion of automated essentialism, the fundamental ethical problem that arises when datasets categorize human identity into a set number of discrete categories based on visual characteristics alone. We end with a call upon the ML community to reassess how training datasets that include human subjects are created and used.

Cite

Text

Luccioni and Crawford. "The Nine Lives of ImageNet: A Sociotechnical Retrospective of a Foundation Dataset and the Limits of Automated Essentialism." Data-centric Machine Learning Research, 2024.

Markdown

[Luccioni and Crawford. "The Nine Lives of ImageNet: A Sociotechnical Retrospective of a Foundation Dataset and the Limits of Automated Essentialism." Data-centric Machine Learning Research, 2024.](https://mlanthology.org/dmlr/2024/luccioni2024dmlr-nine/)

BibTeX

@article{luccioni2024dmlr-nine,
  title     = {{The Nine Lives of ImageNet: A Sociotechnical Retrospective of a Foundation Dataset and the Limits of Automated Essentialism}},
  author    = {Luccioni, Sasha and Crawford, Kate},
  journal   = {Data-centric Machine Learning Research},
  year      = {2024},
  pages     = {1-18},
  volume    = {1},
  url       = {https://mlanthology.org/dmlr/2024/luccioni2024dmlr-nine/}
}