Many-to-Many Image Generation with Auto-Regressive Diffusion Models

Abstract

Recent advancements in image generation have made significant progress, yet existing models present limitations in perceiving and generating an arbitrary number of interrelated images within a broad context. This limitation becomes increasingly critical as the demand for multi-image scenarios, such as multi-view images and visual narratives, grows with the expansion of multimedia platforms. This paper introduces a domain-general framework for many-to-many image generation, capable of producing interrelated image series from a given set of images, offering a scalable solution that obviates the need for task-specific solutions across different multi-image scenarios. To facilitate this, we present MIS, a novel large-scale multi-image dataset, containing 12M synthetic multi-image samples, each with 25 interconnected images. Utilizing Stable Diffusion with varied latent noises, our method produces a set of interconnected images from a single caption. Leveraging MIS, we learn M2M, an autoregressive model for many-to-many generations, where each image is modeled within a diffusion framework. Throughout training on the synthetic MIS, the model excels in capturing style and content from preceding images — synthetic or real — and generates novel images following the captured patterns. Furthermore, through task-specific fine-tuning, our model demonstrates its adaptability to specific multi-image generation tasks, like Visual Procedure Generation.

Cite

Text

Shen et al. "Many-to-Many Image Generation with Auto-Regressive Diffusion Models." ICML 2024 Workshops: SPIGM, 2024.

Markdown

[Shen et al. "Many-to-Many Image Generation with Auto-Regressive Diffusion Models." ICML 2024 Workshops: SPIGM, 2024.](https://mlanthology.org/icmlw/2024/shen2024icmlw-manytomany/)

BibTeX

@inproceedings{shen2024icmlw-manytomany,
  title     = {{Many-to-Many Image Generation with Auto-Regressive Diffusion Models}},
  author    = {Shen, Ying and Zhang, Yizhe and Zhai, Shuangfei and Huang, Lifu and Susskind, Joshua M. and Gu, Jiatao},
  booktitle = {ICML 2024 Workshops: SPIGM},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/shen2024icmlw-manytomany/}
}