TrackGo: A Flexible and Efficient Method for Controllable Video Generation

Zhou, Haitao; Wang, Chuang; Nie, Rui; Liu, Jinlin; Yu, Dongdong; Yu, Qian; Wang, Changhu

doi:10.1609/AAAI.V39I10.33167

TrackGo: A Flexible and Efficient Method for Controllable Video Generation

Haitao Zhou, Chuang Wang, Rui Nie, Jinlin Liu, Dongdong Yu, Qian Yu, Changhu Wang

AAAI 2025 pp. 10743-10751

doi:10.1609/AAAI.V39I10.33167 /aaai/2025/zhou2025aaai-trackgo/

Abstract

Recent years have seen substantial progress in diffusion-based controllable video generation. However, achieving precise control in complex scenarios, including fine-grained object parts, sophisticated motion trajectories, and coherent background movement, remains a challenge. In this paper, we introduce *TrackGo*, a novel approach that leverages free-form masks and arrows for conditional video generation. This method offers users with a flexible and precise mechanism for manipulating video content. We also propose the *TrackAdapter* for control implementation, an efficient and lightweight adapter designed to be seamlessly integrated into the temporal self-attention layers of a pretrained video generation model. This design leverages our observation that the attention map of these layers can accurately activate regions corresponding to motion in videos. Our experimental results demonstrate that our new approach, enhanced by the TrackAdapter, achieves state-of-the-art performance on key metrics such as FVD, FID, and ObjMC scores.

PDF AAAI Semantic Scholar

Cite

Text

Zhou et al. "TrackGo: A Flexible and Efficient Method for Controllable Video Generation." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I10.33167

Markdown

[Zhou et al. "TrackGo: A Flexible and Efficient Method for Controllable Video Generation." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhou2025aaai-trackgo/) doi:10.1609/AAAI.V39I10.33167

BibTeX

@inproceedings{zhou2025aaai-trackgo,
  title     = {{TrackGo: A Flexible and Efficient Method for Controllable Video Generation}},
  author    = {Zhou, Haitao and Wang, Chuang and Nie, Rui and Liu, Jinlin and Yu, Dongdong and Yu, Qian and Wang, Changhu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {10743-10751},
  doi       = {10.1609/AAAI.V39I10.33167},
  url       = {https://mlanthology.org/aaai/2025/zhou2025aaai-trackgo/}
}