Motion-Zero: A Zero-Shot Trajectory Control Framework of Moving Object for Diffusion-Based Video Generation

Abstract

Recent large-scale pre-trained diffusion models have demonstrated a powerful generative ability to produce high-quality videos from detailed text descriptions. However, exerting control over the motion of objects in videos generated by any video diffusion model remains a challenging problem. In this paper, we propose a novel zero-shot moving object trajectory control framework, Motion-Zero, to enable arbitrary single-object-trajectory control for the text-to-video diffusion model. To this end, an initial noise prior module is designed to provide a position-based prior to improve the stability of the appearance of the moving object and the accuracy of position. In addition, based on the attention map of the U-Net, spatial constraints are directly applied to the denoising process of diffusion models, which further ensures the positional consistency of moving objects during the inference. Furthermore, temporal consistency is guaranteed with a proposed shift temporal attention mechanism. Our method can be flexibly applied to various state-of-the-art video diffusion models without any training process. Extensive experiments demonstrate our proposed method can control the motion trajectories of arbitrary objects while preserving the original ability to generate high-quality videos.

Cite

Text

Chen et al. "Motion-Zero: A Zero-Shot Trajectory Control Framework of Moving Object for Diffusion-Based Video Generation." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I2.32198

Markdown

[Chen et al. "Motion-Zero: A Zero-Shot Trajectory Control Framework of Moving Object for Diffusion-Based Video Generation." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/chen2025aaai-motion-a/) doi:10.1609/AAAI.V39I2.32198

BibTeX

@inproceedings{chen2025aaai-motion-a,
  title     = {{Motion-Zero: A Zero-Shot Trajectory Control Framework of Moving Object for Diffusion-Based Video Generation}},
  author    = {Chen, Changgu and Shu, Junwei and He, Gaoqi and Wang, Changbo and Li, Yang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {2016-2024},
  doi       = {10.1609/AAAI.V39I2.32198},
  url       = {https://mlanthology.org/aaai/2025/chen2025aaai-motion-a/}
}