Muse: Text-to-Image Generation via Masked Generative Transformers
Abstract
We present Muse, a text-to-image Transformermodel that achieves state-of-the-art image genera-tion performance while being significantly moreefficient than diffusion or autoregressive models.Muse is trained on a masked modeling task indiscrete token space: given the text embeddingextracted from a pre-trained large language model(LLM), Muse learns to predict randomly maskedimage tokens. Compared to pixel-space diffusionmodels, such as Imagen and DALL-E 2, Muse issignificantly more efficient due to the use of dis-crete tokens and requires fewer sampling itera-tions; compared to autoregressive models such asParti, Muse is more efficient due to the use of par-allel decoding. The use of a pre-trained LLM en-ables fine-grained language understanding, whichtranslates to high-fidelity image generation andthe understanding of visual concepts such as ob-jects, their spatial relationships, pose, cardinalityetc. Our 900M parameter model achieves a newSOTA on CC3M, with an FID score of 6.06. TheMuse 3B parameter model achieves an FID of7.88 on zero-shot COCO evaluation, along with aCLIP score of 0.32. Muse also directly enables anumber of image editing applications without theneed to fine-tune or invert the model: inpainting,outpainting, and mask-free editing. More resultsand videos demonstrating editing are available at https://muse-icml.github.io/
Cite
Text
Chang et al. "Muse: Text-to-Image Generation via Masked Generative Transformers." International Conference on Machine Learning, 2023.Markdown
[Chang et al. "Muse: Text-to-Image Generation via Masked Generative Transformers." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/chang2023icml-muse/)BibTeX
@inproceedings{chang2023icml-muse,
title = {{Muse: Text-to-Image Generation via Masked Generative Transformers}},
author = {Chang, Huiwen and Zhang, Han and Barber, Jarred and Maschinot, Aaron and Lezama, Jose and Jiang, Lu and Yang, Ming-Hsuan and Murphy, Kevin Patrick and Freeman, William T. and Rubinstein, Michael and Li, Yuanzhen and Krishnan, Dilip},
booktitle = {International Conference on Machine Learning},
year = {2023},
pages = {4055-4075},
volume = {202},
url = {https://mlanthology.org/icml/2023/chang2023icml-muse/}
}