SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning

Urwa Muaz, Wondong Jang, Rohun Tripathi, Santhosh Mani, Wenbin Ouyang, Ravi Teja Gadde, Baris Gecer, Sergio Elizondo, Reza Madad, Naveen Nair

ICCV 2023 pp. 7833-7842

doi:10.1109/ICCV51070.2023.00720 /iccv/2023/muaz2023iccv-sidgan/

Abstract

Dubbed video generation aims to accurately synchronize mouth movements of a given facial video with driving audio while preserving identity and scene-specific visual dynamics, such as head pose and lighting. Despite the accurate lip generation of previous approaches that adopts a pretrained audio-video synchronization metric as an objective function, called Sync-Loss, extending it to high-resolution videos was challenging due to shift biases in the loss landscape that inhibit tandem optimization of Sync-Loss and visual quality, leading to a loss of detail. To address this issue, we introduce shift-invariant learning, which generates photo-realistic high-resolution videos with accurate Lip-Sync. Further, we employ a pyramid network with coarse-to-fine image generation to improve stability and lip syncronization. Our model outperforms state-of-the-art methods on multiple benchmark datasets, including AVSpeech, HDTF, and LRW, in terms of photo-realism, identity preservation, and Lip-Sync accuracy.

PDF ICCV Semantic Scholar

Cite

Text

Muaz et al. "SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00720

Markdown

[Muaz et al. "SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/muaz2023iccv-sidgan/) doi:10.1109/ICCV51070.2023.00720

BibTeX

@inproceedings{muaz2023iccv-sidgan,
  title     = {{SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning}},
  author    = {Muaz, Urwa and Jang, Wondong and Tripathi, Rohun and Mani, Santhosh and Ouyang, Wenbin and Gadde, Ravi Teja and Gecer, Baris and Elizondo, Sergio and Madad, Reza and Nair, Naveen},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {7833-7842},
  doi       = {10.1109/ICCV51070.2023.00720},
  url       = {https://mlanthology.org/iccv/2023/muaz2023iccv-sidgan/}
}