Frequency-Guided Diffusion for Training-Free Text-Driven Image Translation

Abstract

Current training-free text-driven image translation primarily uses diffusion features (convolution and attention) of pre-trained model as guidance to preserve the style/structure of source image in translated image. However, the coarse guidance at feature level struggles with style (e.g., visual patterns) and structure (e.g., edges) alignment with the source. Based on the observation that the low-/high-frequency components retain style/structure information of image, in this work, we propose training-free Frequency-Guided Diffusion (FGD), which tailors low-/high-frequency guidance for style- and structure-guided translation, respectively. For low-frequency guidance (style-guided), we substitute the low-frequency components of diffusion latents from sampling process with those from inversion of source and normalize the obtained latent with composited spectrum to enforce color alignment. For high-frequency guidance (structure-guided), we propose high-frequency alignment and high-frequency injection that compensate each other. High-frequency alignment preserves edges and contour by adjusting the predicted noise with guidance function that aligns high-frequency image regions between sampling and source image. High-frequency injection facilitates layout preservation by injecting high-frequency components of diffusion convolution features (from inversion) to sampling process. Qualitative and quantitative results verify the superiority of our method on style- and structure-guided translation tasks.

Cite

Text

Gao et al. "Frequency-Guided Diffusion for Training-Free Text-Driven Image Translation." International Conference on Computer Vision, 2025.

Markdown

[Gao et al. "Frequency-Guided Diffusion for Training-Free Text-Driven Image Translation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/gao2025iccv-frequencyguided/)

BibTeX

@inproceedings{gao2025iccv-frequencyguided,
  title     = {{Frequency-Guided Diffusion for Training-Free Text-Driven Image Translation}},
  author    = {Gao, Zheng and Song, Jifei and Zhang, Zhensong and Deng, Jiankang and Patras, Ioannis},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {19195-19205},
  url       = {https://mlanthology.org/iccv/2025/gao2025iccv-frequencyguided/}
}