Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA

Abstract

Pretraining datasets are foundational to the development of multimodal models, yet they often have inherent biases and toxic content from the web-scale corpora they are sourced from. In this paper, we investigate the prevalence of toxicity in LLaVA image-text pretraining dataset, examining how harmful content manifests in different modalities. We present a comprehensive analysis of common toxicity categories and propose targeted mitigation strategies, resulting in the creation of a refined toxicity-mitigated dataset. This dataset removes 7,531 of toxic image-text pairs in the LLaVA pre-training dataset. We offer guidelines for implementing robust toxicity detection pipelines. Our findings underscore the need to actively identify and filter toxic content - such as hate speech, explicit imagery, and targeted harassment - to build more responsible and equitable multimodal systems. The toxicity-mitigated dataset is open source and is available for further research.

Cite

Text

Alam et al. "Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Alam et al. "Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/alam2025cvprw-understanding/)

BibTeX

@inproceedings{alam2025cvprw-understanding,
  title     = {{Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA}},
  author    = {Alam, Nahid and Kanjula, Karthik Reddy and Guthikonda, Surya and Islam, Shayakh},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {5278-5282},
  url       = {https://mlanthology.org/cvprw/2025/alam2025cvprw-understanding/}
}