PreciseControl: Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control
Abstract
Recently, we have seen a surge of personalization methods for text-to-image (T2I) diffusion models to learn a concept using a few images. Existing approaches, when used for face personalization, suffer to achieve convincing inversion with identity preservation and rely on semantic text-based editing of the generated face. However, a more fine-grained control is desired for facial attribute editing, which is challenging to achieve solely with text prompts. In contrast, StyleGAN models learn a rich face prior and enable smooth control towards fine-grained attribute editing by latent manipulation. This work uses the disentangled W+ space of StyleGANs to condition the T2I model. This approach allows us to precisely manipulate facial attributes, such as smoothly introducing a smile, while preserving the existing coarse text-based control inherent in T2I models. To enable conditioning of the T2I model on the W+ space, we train a latent mapper to translate latent codes from W+ to the token embedding space of the T2I model. The proposed approach excels in the precise inversion of face images with attribute preservation and facilitates continuous control for fine-grained attribute editing. Furthermore, our approach can be readily extended to generate compositions involving multiple individuals. We perform extensive experiments to validate our method for face personalization and fine-grained attribute editing.
Cite
Text
Parihar et al. "PreciseControl: Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73007-8_27Markdown
[Parihar et al. "PreciseControl: Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/parihar2024eccv-precisecontrol/) doi:10.1007/978-3-031-73007-8_27BibTeX
@inproceedings{parihar2024eccv-precisecontrol,
title = {{PreciseControl: Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control}},
author = {Parihar, Rishubh and Vs, Sachidanand and Mani, Sabariswaran and Karmali, Tejan and Radhakrishnan, Venkatesh Babu},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-73007-8_27},
url = {https://mlanthology.org/eccv/2024/parihar2024eccv-precisecontrol/}
}