WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception
Abstract
Generative video modeling has made significant strides, yet ensuring structural and temporal consistency over long sequences remains a challenge. Current methods predominantly rely on RGB signals, leading to accumulated errors in object structure and motion over extended durations. To address these issues, we introduce WorldWeaver, a robust framework for long video generation that jointly models RGB frames and perceptual conditions within a unified long-horizon modeling scheme. Our training framework offers three key advantages. First, by jointly predicting perceptual conditions and color information from a unified representation, it significantly enhances temporal consistency and motion dynamics. Second, by leveraging depth cues, which we observe to be more resistant to drift than RGB, we construct a memory bank that preserves clearer contextual information, improving quality in long-horizon video generation. Third, we employ segmented noise scheduling for training prediction groups, which further mitigates drift and reduces computational cost. Extensive experiments on both diffusion and rectified flow-based models demonstrate the effectiveness of WorldWeaver in reducing temporal drift and improving the fidelity of generated videos.
Cite
Text
Liu et al. "WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception." Advances in Neural Information Processing Systems, 2025.Markdown
[Liu et al. "WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/liu2025neurips-worldweaver/)BibTeX
@inproceedings{liu2025neurips-worldweaver,
title = {{WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception}},
author = {Liu, Zhiheng and Deng, Xueqing and Chen, Shoufa and Wang, Angtian and Guo, Qiushan and Han, Mingfei and Xue, Zeyue and Chen, Mengzhao and Luo, Ping and Yang, Linjie},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/liu2025neurips-worldweaver/}
}