FineStyle: Fine-Grained Controllable Style Personalization for Text-to-Image Models

Abstract

Few-shot fine-tuning of text-to-image (T2I) generation models enables people to create unique images in their own style using natural languages without requiring extensive prompt engineering. However, fine-tuning with only a handful, as little as one, of image-text paired data prevents fine-grained control of style attributes at generation. In this paper, we present FineStyle, a few-shot fine-tuning method that allows enhanced controllability for style personalized text-to-image generation. To overcome the lack of training data for fine-tuning, we propose a novel concept-oriented data scaling that amplifies the number of image-text pair, each of which focuses on different concepts (e.g., objects) in the style reference image. We also identify the benefit of parameter-efficient adapter tuning of key and value kernels of cross-attention layers. Extensive experiments show the effectiveness of FineStyle at following fine-grained text prompts and delivering visual quality faithful to the specified style, measured by CLIP scores and human raters.

Cite

Text

Zhang et al. "FineStyle: Fine-Grained Controllable Style Personalization for Text-to-Image Models." Neural Information Processing Systems, 2024. doi:10.52202/079017-1677

Markdown

[Zhang et al. "FineStyle: Fine-Grained Controllable Style Personalization for Text-to-Image Models." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/zhang2024neurips-finestyle/) doi:10.52202/079017-1677

BibTeX

@inproceedings{zhang2024neurips-finestyle,
  title     = {{FineStyle: Fine-Grained Controllable Style Personalization for Text-to-Image Models}},
  author    = {Zhang, Gong and Sohn, Kihyuk and Hahn, Meera and Shi, Humphrey and Essa, Irfan},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1677},
  url       = {https://mlanthology.org/neurips/2024/zhang2024neurips-finestyle/}
}