Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos

Ma, Yue; He, Yingqing; Cun, Xiaodong; Wang, Xintao; Chen, Siran; Li, Xiu; Chen, Qifeng

doi:10.1609/AAAI.V38I5.28206

Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos

Yue Ma, Yingqing He, Xiaodong Cun, Xintao Wang, Siran Chen, Xiu Li, Qifeng Chen

AAAI 2024 pp. 4117-4125

doi:10.1609/AAAI.V38I5.28206 /aaai/2024/ma2024aaai-follow/

Abstract

Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human. Nevertheless, this task has been restricted by the absence of a comprehensive dataset featuring paired video-pose captions and the generative prior models for videos. In this work, we design a novel two-stage training scheme that can utilize easily obtained datasets (i.e., image pose pair and pose-free video) and the pre-trained text-to-image (T2I) model to obtain the pose-controllable character videos. Specifically, in the first stage, only the keypoint image pairs are used only for a controllable text-to-image generation. We learn a zero-initialized convolutional encoder to encode the pose information. In the second stage, we finetune the motion of the above network via a pose-free video dataset by adding the learnable temporal self-attention and reformed cross-frame self-attention blocks. Powered by our new designs, our method successfully generates continuously pose-controllable character videos while keeps the editing and concept composition ability of the pre-trained T2I model. The code and models are available on https://follow-your-pose.github.io/.

PDF AAAI Semantic Scholar

Cite

Text

Ma et al. "Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I5.28206

Markdown

[Ma et al. "Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/ma2024aaai-follow/) doi:10.1609/AAAI.V38I5.28206

BibTeX

@inproceedings{ma2024aaai-follow,
  title     = {{Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos}},
  author    = {Ma, Yue and He, Yingqing and Cun, Xiaodong and Wang, Xintao and Chen, Siran and Li, Xiu and Chen, Qifeng},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {4117-4125},
  doi       = {10.1609/AAAI.V38I5.28206},
  url       = {https://mlanthology.org/aaai/2024/ma2024aaai-follow/}
}