FlashWorld: High-Quality 3D Scene Generation Within Seconds

Li, Xinyang; Wang, Tengfei; Gu, Zixiao; Zhang, Shengchuan; Guo, Chunchao; Cao, Liujuan

FlashWorld: High-Quality 3D Scene Generation Within Seconds

Xinyang Li, Tengfei Wang, Zixiao Gu, Shengchuan Zhang, Chunchao Guo, Liujuan Cao

ICLR 2026

/iclr/2026/li2026iclr-flashworld/

Abstract

We propose FlashWorld, a generative model that produces 3D scenes from a single image or text prompt in seconds, $10 \sim 100\times$ faster than previous works while possessing superior rendering quality. Our approach shifts from the conventional multi-view-oriented (MV-oriented) paradigm, which generates multi-view images for subsequent 3D reconstruction, to a 3D-oriented approach where the model directly produces 3D Gaussian representations during multi-view generation. While ensuring 3D consistency, 3D-oriented method typically suffers poor visual quality. FlashWorld includes a dual-mode pre-training phase followed by a cross-mode post-training phase, effectively integrating the strengths of both paradigms. Specifically, leveraging the prior from a video diffusion model, we first pre-train a dual-mode multi-view diffusion model, which jointly supports MV-oriented and 3D-oriented generation mode. To bridge the quality gap in 3D-oriented generation, we further propose a cross-mode post-training distillation by matching distribution from consistent 3D-oriented mode to high-quality MV-oriented mode. This not only enhances visual quality while maintaining 3D consistency, but also reduces the required denoising steps for inference. Also, we propose a strategy to leverage massive single-view images and text prompts during this process to enhance the model's generalization to out-of-distribution inputs. Extensive experiments demonstrate the superiority and efficiency of our method. Our code is released at https://github.com/imlixinyang/FlashWorld.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Li et al. "FlashWorld: High-Quality 3D Scene Generation Within Seconds." International Conference on Learning Representations, 2026.

Markdown

[Li et al. "FlashWorld: High-Quality 3D Scene Generation Within Seconds." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/li2026iclr-flashworld/)

BibTeX

@inproceedings{li2026iclr-flashworld,
  title     = {{FlashWorld: High-Quality 3D Scene Generation Within Seconds}},
  author    = {Li, Xinyang and Wang, Tengfei and Gu, Zixiao and Zhang, Shengchuan and Guo, Chunchao and Cao, Liujuan},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/li2026iclr-flashworld/}
}