Animate Your Motion: Turning Still Images into Dynamic Videos
Abstract
In recent years, diffusion models have made remarkable strides in text-to-video generation, sparking a quest for enhanced control over video outputs to more accurately reflect user intentions. Traditional efforts predominantly focus on employing either semantic cues, like images or depth maps, or motion-based conditions, like moving sketches or object bounding boxes. Semantic inputs offer a rich scene context but lack detailed motion specificity; conversely, motion inputs provide precise trajectory information but miss the broader semantic narrative. For the first time, we integrate both semantic and motion cues within a diffusion model for video generation, as demonstrated in Fig. ??. To this end, we introduce the Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs. It incorporates a recognized motion conditioning module [?] and investigates various approaches to integrate scene conditions, promoting synergy between different modalities. For model training, we separate the conditions for the two modalities, introducing a two-stage training pipeline. Experimental results demonstrate that our design significantly enhances video quality, motion precision, and semantic coherence.
Cite
Text
Li et al. "Animate Your Motion: Turning Still Images into Dynamic Videos." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72848-8_24Markdown
[Li et al. "Animate Your Motion: Turning Still Images into Dynamic Videos." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/li2024eccv-animate/) doi:10.1007/978-3-031-72848-8_24BibTeX
@inproceedings{li2024eccv-animate,
title = {{Animate Your Motion: Turning Still Images into Dynamic Videos}},
author = {Li, Mingxiao and Wan, Bo and Moens, Sien and Tuytelaars, Tinne},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-72848-8_24},
url = {https://mlanthology.org/eccv/2024/li2024eccv-animate/}
}