Magic-Me: Identity-Specific Video Customized Diffusion
Abstract
Creating content with specified identities (ID) has attracted significant interest in the field of image generative models. However, its extension to video generation is not well explored. In this work, we propose a simple yet effective subject identity controllable video generation framework, termed Video Custom Diffusion (VCD). With a specified identity defined by a few images, VCD reinforces the identity information extraction, and injects frame-wise correlation for stable video outputs. We propose three novel components: 1) an ID module to extract ID features; 2) a 3D Gaussian Noise Prior for better inter-frame consistency; and 3) Face VCD and Tiled VCD modules to upscale the video with detailed characteristics. We conducted extensive experiments to show that VCD is able to generate stable and high-quality videos with specific human subjects animated in diverse scenes and motions. We further show that VCD could integrate conditional input and prompt travel to enable more delicate controls.
Cite
Text
Ma et al. "Magic-Me: Identity-Specific Video Customized Diffusion." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-92808-6_2Markdown
[Ma et al. "Magic-Me: Identity-Specific Video Customized Diffusion." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/ma2024eccvw-magicme/) doi:10.1007/978-3-031-92808-6_2BibTeX
@inproceedings{ma2024eccvw-magicme,
title = {{Magic-Me: Identity-Specific Video Customized Diffusion}},
author = {Ma, Ze and Zhou, Daquan and Wang, Xue-She and Yeh, Chun-Hsiao and Li, Xiuyu and Yang, Huanrui and Dong, Zhen and Keutzer, Kurt and Feng, Jiashi},
booktitle = {European Conference on Computer Vision Workshops},
year = {2024},
pages = {19-37},
doi = {10.1007/978-3-031-92808-6_2},
url = {https://mlanthology.org/eccvw/2024/ma2024eccvw-magicme/}
}