RT2I-Bench: Evaluating Robustness of Text-to-Image Systems Against Adversarial Attacks

Abstract

Text-to-Image (T2I) systems have demonstrated impressive abilities in the generation of images from text descriptions. However, these systems remain susceptible to adversarial prompts—carefully crafted input manipulations that can result in misaligned or even toxic outputs. This vulnerability highlights the need for systematic evaluation of attack strategies that exploit these weaknesses, as well as for testing the robustness of T2I systems against them. To this end, this work introduces the RT2I-Bench benchmark. RT2I-Bench serves two primary purposes. First, it provides a structured evaluation of various adversarial attacks, examining their effectiveness, transferability, stealthiness and potential for generating misaligned or toxic outputs, as well as assessing the resilience of state-of-the-art T2I models to such attacks. We observe that state-of-the-art T2I systems are vulnerable to adversarial prompts, with the most effective attacks achieving success rates of over 60\% across the majority of T2I models we tested. Second, RT2I-Bench enables the creation of a set of strong adversarial prompts (consisting of 1,439 that induce misaligned or targeted outputs and 173 that induce toxic outputs), which are effective across a wide range of systems. Finally, our benchmark is designed to be extensible, enabling the seamless addition of new attacks, T2I models, and evaluation metrics. This framework provides an automated solution for robustness assessment and adversarial prompt generation in T2I systems.

Cite

Text

Glentis et al. "RT2I-Bench: Evaluating Robustness of Text-to-Image Systems Against Adversarial Attacks." Transactions on Machine Learning Research, 2026.

Markdown

[Glentis et al. "RT2I-Bench: Evaluating Robustness of Text-to-Image Systems Against Adversarial Attacks." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/glentis2026tmlr-rt2ibench/)

BibTeX

@article{glentis2026tmlr-rt2ibench,
  title     = {{RT2I-Bench: Evaluating Robustness of Text-to-Image Systems Against Adversarial Attacks}},
  author    = {Glentis, Athanasios and Tsaknakis, Ioannis and Peng, Jiangweizhi and Xian, Xun and Zhang, Yihua and Liu, Gaowen and Fleming, Charles and Hong, Mingyi},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/glentis2026tmlr-rt2ibench/}
}