TFCustom: Customized Image Generation with Time-Aware Frequency Feature Guidance

Abstract

Subject-driven image personalization has seen notable advancements, especially with the advent of the ReferenceNet paradigm. ReferenceNet excels in integrating image reference features, making it highly applicable in creative and commercial settings. However, current implementations of ReferenceNet primarily operate as latent-level feature extractors, which limit their potential. This constraint hinders the provision of appropriate features to the denoising backbone across different timesteps, leading to suboptimal image consistency. In this paper, we revisit the extraction of reference features and propose TFCustom, a model framework designed to focus on reference image features at different temporal steps and frequency levels. Specifically, we firstly propose synchronized ReferenceNet to extract reference image features while simultaneously optimizing noise injection and denoising for the reference image. We also propose a time-aware frequency feature refinement module that leverages high- and low-frequency filters, combined with time embeddings, to adaptively select the degree of reference feature injection. Additionally, to enhance the similarity between reference objects and the generated image, we introduce a novel reward-based loss that encourages greater alignment between the reference and generated images. Experimental results demonstrate state-of-the-art performance in both multi-object and single-object reference generation, with significant improvements in texture and textual detail generation over existing methods.

Cite

Text

Liu et al. "TFCustom: Customized Image Generation with Time-Aware Frequency Feature Guidance." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00259

Markdown

[Liu et al. "TFCustom: Customized Image Generation with Time-Aware Frequency Feature Guidance." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/liu2025cvpr-tfcustom/) doi:10.1109/CVPR52734.2025.00259

BibTeX

@inproceedings{liu2025cvpr-tfcustom,
  title     = {{TFCustom: Customized Image Generation with Time-Aware Frequency Feature Guidance}},
  author    = {Liu, Mushui and She, Dong and Pang, Jingxuan and Huang, Qihan and Ying, Jiacheng and He, Wanggui and Hou, Yuanlei and Fu, Siming},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {2714-2723},
  doi       = {10.1109/CVPR52734.2025.00259},
  url       = {https://mlanthology.org/cvpr/2025/liu2025cvpr-tfcustom/}
}