Enabling ControlNet to Follow Localized Descriptions Using Cross-Attention Control

Lukovnikov, Denis; Fischer, Asja

doi:10.1007/978-3-032-05981-9_19

Enabling ControlNet to Follow Localized Descriptions Using Cross-Attention Control

Denis Lukovnikov, Asja Fischer

ECML-PKDD 2025 pp. 310-327

doi:10.1007/978-3-032-05981-9_19 /ecmlpkdd/2025/lukovnikov2025ecmlpkdd-enabling/

Abstract

ControlNet enables fine-grained control over image layout in prominent generators like Stable Diffusion. However, it lacks the ability to take into account localized textual descriptions that indicate which image region is described by which phrase in the prompt. In this work, we enable ControlNet to use localized descriptions using a training-free approach that modifies the cross-attention scores during generation. For doing so, we adapt and investigate several existing cross-attention control methods and identify shortcomings that cause failure or image degradation under some conditions. To address these shortcomings, we develop a novel cross-attention manipulation method. Qualitative and quantitative experimental studies demonstrate the effectiveness of the proposed augmented ControlNet.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Lukovnikov and Fischer. "Enabling ControlNet to Follow Localized Descriptions Using Cross-Attention Control." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-05981-9_19

Markdown

[Lukovnikov and Fischer. "Enabling ControlNet to Follow Localized Descriptions Using Cross-Attention Control." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/lukovnikov2025ecmlpkdd-enabling/) doi:10.1007/978-3-032-05981-9_19

BibTeX

@inproceedings{lukovnikov2025ecmlpkdd-enabling,
  title     = {{Enabling ControlNet to Follow Localized Descriptions Using Cross-Attention Control}},
  author    = {Lukovnikov, Denis and Fischer, Asja},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {310-327},
  doi       = {10.1007/978-3-032-05981-9_19},
  url       = {https://mlanthology.org/ecmlpkdd/2025/lukovnikov2025ecmlpkdd-enabling/}
}