MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

Abstract

The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations of text prompts. However, expressing complex or nuanced ideas in text alone can be difficult. To ease image generation, we propose MultiFusion that allows one to express complex and nuanced concepts with arbitrarily interleaved inputs of multiple modalities and languages. MultiFusion leverages pre-trained models and aligns them for integration into a cohesive system, thereby avoiding the need for extensive training from scratch. Our experimental results demonstrate the efficient transfer of capabilities from individual modules to the downstream model. Specifically, the fusion of all independent components allows the image generation module to utilize multilingual, interleaved multimodal inputs despite being trained solely on monomodal data in a single language.

Cite

Text

Bellagente et al. "MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation." Neural Information Processing Systems, 2023.

Markdown

[Bellagente et al. "MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/bellagente2023neurips-multifusion/)

BibTeX

@inproceedings{bellagente2023neurips-multifusion,
  title     = {{MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation}},
  author    = {Bellagente, Marco and Brack, Manuel and Teufel, Hannah and Friedrich, Felix and Deiseroth, Björn and Eichenberg, Constantin and Dai, Andrew M and Baldock, Robert and Nanda, Souradeep and Oostermeijer, Koen and Cruz-Salinas, Andres Felipe and Schramowski, Patrick and Kersting, Kristian and Weinbach, Samuel},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/bellagente2023neurips-multifusion/}
}