Century: A Dataset of Sensitive Historical Images

Abstract

How do we measure the way multi-modal generative models, like GPT-4 and Gemini, describe images of historical events and figures, whose legacies may be nuanced, multifaceted, or contested? As a first step to addressing this challenge, we introduce Century – a novel dataset of sensitive historical images. This dataset consists of 1,500 images from recent history, created through a novel automated method combining knowledge graphs and language models, while being rooted in the practices of museums and digital archives. We demonstrate through automated and human evaluation that this method produces a set of images that depict events and figures that are diverse across topics and represents all regions of the world, with implications for the development of evaluations for historical contextualisation and socio-cultural understanding.

Cite

Text

Akbulut et al. "Century: A Dataset of Sensitive Historical Images." NeurIPS 2024 Workshops: SoLaR, 2024.

Markdown

[Akbulut et al. "Century: A Dataset of Sensitive Historical Images." NeurIPS 2024 Workshops: SoLaR, 2024.](https://mlanthology.org/neuripsw/2024/akbulut2024neuripsw-century/)

BibTeX

@inproceedings{akbulut2024neuripsw-century,
  title     = {{Century: A Dataset of Sensitive Historical Images}},
  author    = {Akbulut, Canfer and Robinson, Kevin and Rauh, Maribeth and Albuquerque, Isabela and Wiles, Olivia and Weidinger, Laura and Rieser, Verena and Hasson, Yana and Marchal, Nahema and Gabriel, Iason and Isaac, William and Hendricks, Lisa Anne},
  booktitle = {NeurIPS 2024 Workshops: SoLaR},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/akbulut2024neuripsw-century/}
}