MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost

Abstract

In this work, we explore a cost-effective framework for multilingual image generation. We find that, unlike models tuned on high-quality images with multilingual annotations, leveraging text encoders pre-trained on widely available, noisy Internet image-text pairs significantly enhances data efficiency in text-to-image (T2I) generation across multiple languages. Based on this insight, we introduce MuLan, Multi-Language adapter, a lightweight language adapter with fewer than 20M parameters, trained alongside a frozen text encoder and image diffusion model. Compared to previous multilingual T2I models, this framework offers: (1) Cost efficiency. Using readily accessible English data and off-the-shelf multilingual text encoders minimizes the training cost; (2) High performance. Achieving comparable generation capabilities in over 110 languages with CLIP similarity scores nearly matching those in English (39.57 for English vs. 39.61 for other languages); and (3) Broad applicability. Seamlessly integrating with compatible community tools like LoRA, LCM, ControlNet, and IP-Adapter, expanding its potential use cases.

Cite

Text

Xing et al. "MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Xing et al. "MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/xing2025icml-mulan/)

BibTeX

@inproceedings{xing2025icml-mulan,
  title     = {{MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost}},
  author    = {Xing, Sen and Zhong, Muyan and Lai, Zeqiang and Li, Liangchen and Liu, Jiawen and Wang, Yaohui and Dai, Jifeng and Wang, Wenhai},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {68953-68969},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/xing2025icml-mulan/}
}