Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models

Abstract

The rapid progress of visual autoregressive (VAR) models has brought new opportunities for text-to-image generation, but also heightened safety concerns. Existing concept erasure techniques, primarily designed for diffusion models, fail to generalize to VARs due to their next-scale token prediction paradigm. In this paper, we first propose a novel VAR Erasure framework **VARE** that enables stable concept erasure in VAR models by leveraging auxiliary visual tokens to reduce fine-tuning intensity. Building upon this, we introduce **S-VARE**, a novel and effective concept erasure method designed for VAR, which incorporates a filtered cross entropy loss to precisely identify and minimally adjust unsafe visual tokens, along with a preservation loss to maintain semantic fidelity, addressing the issues such as language drift and reduced diversity introduce by na\"ive fine-tuning. Extensive experiments demonstrate that our approach achieves surgical concept erasure while preserving generation quality, thereby closing the safety gap in autoregressive text-to-image generation by earlier methods.

Cite

Text

Zhong et al. "Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models." International Conference on Learning Representations, 2026.

Markdown

[Zhong et al. "Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhong2026iclr-closing/)

BibTeX

@inproceedings{zhong2026iclr-closing,
  title     = {{Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models}},
  author    = {Zhong, Xinhao and Zhou, Yimin and Zhang, Zhiqi and Li, Junhao and Yi, Sun and Chen, Bin and Xia, Shu-Tao and Wang, Xuan and Xu, Ke},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhong2026iclr-closing/}
}