Audio-Synchronized Visual Animation

Lin Zhang, Shentong Mo, Yijing Zhang, Pedro Morgado

ECCV 2024

doi:10.1007/978-3-031-72940-9_1 /eccv/2024/zhang2024eccv-audiosynchronized/

Abstract

Current visual generation methods can produce high-quality videos guided by text prompts. However, effectively controlling object dynamics remains a challenge. This work explores audio as a cue to generate temporally synchronized image animations. We introduce Audio-Synchronized Visual Animation (), a task that aims to animate a static image of an object with motions temporally guided by audio clips. To this end, we present , a dataset curated from VGGSound with videos featuring synchronized audio-visual events across 15 categories. We also present a diffusion model, , capable of generating audio-guided animations. Extensive evaluations validate as a reliable benchmark for synchronized generation and demonstrate our model’s superior performance. We further explore ’s potential in a variety of audio-synchronized generation tasks, from generating full videos without a base image to controlling object motions with various sounds. We hope our established benchmark can open new avenues for controllable visual generation.

PDF ECCV Semantic Scholar

Cite

Text

Zhang et al. "Audio-Synchronized Visual Animation." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72940-9_1

Markdown

[Zhang et al. "Audio-Synchronized Visual Animation." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/zhang2024eccv-audiosynchronized/) doi:10.1007/978-3-031-72940-9_1

BibTeX

@inproceedings{zhang2024eccv-audiosynchronized,
  title     = {{Audio-Synchronized Visual Animation}},
  author    = {Zhang, Lin and Mo, Shentong and Zhang, Yijing and Morgado, Pedro},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72940-9_1},
  url       = {https://mlanthology.org/eccv/2024/zhang2024eccv-audiosynchronized/}
}