FiVA: Fine-Grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Wu, Tong; Xu, Yinghao; Po, Ryan; Zhang, Mengchen; Yang, Guandao; Wang, Jiaqi; Liu, Ziwei; Lin, Dahua; Wetzstein, Gordon

doi:10.52202/079017-1006

FiVA: Fine-Grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Tong Wu, Yinghao Xu, Ryan Po, Mengchen Zhang, Guandao Yang, Jiaqi Wang, Ziwei Liu, Dahua Lin, Gordon Wetzstein

NeurIPS 2024

doi:10.52202/079017-1006 /neurips/2024/wu2024neurips-fiva/

Abstract

Recent advances in text-to-image generation have enabled the creation of high-quality images with diverse applications. However, accurately describing desired visual attributes can be challenging, especially for non-experts in art and photography. An intuitive solution involves adopting favorable attributes from source images. Current methods attempt to distill identity and style from source images. However, "style" is a broad concept that includes texture, color, and artistic elements, but does not cover other important attributes like lighting and dynamics. Additionally, a simplified "style" adaptation prevents combining multiple attributes from different sources into one generated image. In this work, we formulate a more effective approach to decompose the aesthetics of a picture into specific visual attributes, letting users apply characteristics like lighting, texture, and dynamics from different images. To achieve this goal, we constructed the first fine-grained visual attributes dataset (FiVA) to the best of our knowledge. This FiVA dataset features a well-organized taxonomy for visual attributes and includes 1 M high-quality generated images with visual attribute annotations. Leveraging this dataset, we propose a fine-grained visual attributes adaptation framework (FiVA-Adapter) , which decouples and adapts visual attributes from one or more source images into a generated one. This approach enhances user-friendly customization, allowing users to selectively apply desired attributes to create images that meet their unique preferences and specific content requirements.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Wu et al. "FiVA: Fine-Grained Visual Attribute Dataset for Text-to-Image Diffusion Models." Neural Information Processing Systems, 2024. doi:10.52202/079017-1006

Markdown

[Wu et al. "FiVA: Fine-Grained Visual Attribute Dataset for Text-to-Image Diffusion Models." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/wu2024neurips-fiva/) doi:10.52202/079017-1006

BibTeX

@inproceedings{wu2024neurips-fiva,
  title     = {{FiVA: Fine-Grained Visual Attribute Dataset for Text-to-Image Diffusion Models}},
  author    = {Wu, Tong and Xu, Yinghao and Po, Ryan and Zhang, Mengchen and Yang, Guandao and Wang, Jiaqi and Liu, Ziwei and Lin, Dahua and Wetzstein, Gordon},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1006},
  url       = {https://mlanthology.org/neurips/2024/wu2024neurips-fiva/}
}