ABCDE: Agentic-Based Controlled Dynamic Erasure for Intent-Aware Safety Reasoning

Abstract

Concept erasure has emerged as a central mechanism for safety alignment in text-conditioned generative models, yet most existing approaches implicitly adopt an unconditional suppression paradigm in which target concepts are removed whenever they appear, regardless of contextual intent. This formulation conflates benign and harmful concept usage, leading to systematic over-suppression that unnecessarily censors policy-compliant content and degrades model utility. We argue that safety intervention should instead be framed as a decision problem grounded in contextual language understanding, rather than as a purely mechanistic removal operation. Based on this perspective, we introduce Intent-Aware Concept Erasure (ICE), a decision-centric formulation that explicitly separates the question of whether a concept should be suppressed from how suppression is realized, enabling context-sensitive intervention policies that preserve benign usage while maintaining safety guarantees. To operationalize this formulation, we present Agentic-Based Controlled Dynamic Erasure (ABCDE), an agentic framework that infers a stable intervention decision from semantic context and realizes it through minimal prompt-level intervention with closed-loop multimodal output feedback, without modifying model parameters. To enable principled evaluation of intent-aware intervention, we further construct the Context-Aware Erasure Benchmark (CAEB), a paired benchmark comprising 500 prompts over 10 object concepts and 100 prompts over 5 artist styles, in which the same concept appears in both removal-required and preservation-required contexts. Experiments on CAEB show that ABCDE achieves substantially higher precision than unconditional baselines while maintaining strong recall, demonstrating effective avoidance of unnecessary suppression in benign contexts.

Cite

Text

Liu and Zhang. "ABCDE: Agentic-Based Controlled Dynamic Erasure for Intent-Aware Safety Reasoning." Transactions on Machine Learning Research, 2026.

Markdown

[Liu and Zhang. "ABCDE: Agentic-Based Controlled Dynamic Erasure for Intent-Aware Safety Reasoning." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/liu2026tmlr-abcde/)

BibTeX

@article{liu2026tmlr-abcde,
  title     = {{ABCDE: Agentic-Based Controlled Dynamic Erasure for Intent-Aware Safety Reasoning}},
  author    = {Liu, Ping and Zhang, Chi},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/liu2026tmlr-abcde/}
}