Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion
Abstract
As a dominant force in text-to-image generation tasks, Diffusion Probabilistic Models (DPMs) face a critical challenge in controllability, struggling to adhere strictly to complex, multi-faceted instructions. In this work, we aim to address this alignment challenge for conditional generation tasks. First, we provide an alternative view of state-of-the-art DPMs as a way of inverting advanced Vision-Language Models (VLMs). With this formulation, we naturally propose a training-free approach that bypasses the conventional sampling process associated with DPMs. By directly optimizing images with the supervision of discriminative VLMs, the proposed method can potentially achieve a better text-image alignment. As proof of concept, we demonstrate the pipeline with the pre-trained BLIP-2 model and identify several key designs for improved image generation. To further enhance the image fidelity, a Score Distillation Sampling module of Stable Diffusion is incorporated. By carefully balancing the two components during optimization, our method can produce high-quality images with near state-of-the-art performance on T2I-Compbench. The code is available at https://github.com/Pepper-lll/VLMinv.
Cite
Text
Liu et al. "Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion." International Conference on Machine Learning, 2024.Markdown
[Liu et al. "Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/liu2024icml-referee/)BibTeX
@inproceedings{liu2024icml-referee,
title = {{Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion}},
author = {Liu, Xuantong and Hu, Tianyang and Wang, Wenjia and Kawaguchi, Kenji and Yao, Yuan},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {31165-31185},
volume = {235},
url = {https://mlanthology.org/icml/2024/liu2024icml-referee/}
}