Multi-Modal Deepfake Detection via Multi-Task Audio-Visual Prompt Learning

Abstract

With the malicious use and dissemination of multi-modal deepfake videos, researchers start to investigate multi-modal deepfake detection. Unfortunately, most of the existing methods tune all the parameters of the deep network with limited speech video datasets and are trained under coarse-grained consistency supervision, which hinders their generalization ability in practical scenarios. To solve these problems, in this paper, we propose the first multi-task audio-visual prompt learning method for multi-modal deepfake video detection, by exploiting multiple foundation models. Specifically, we construct a two-stream multi-task learning architecture and propose sequential visual prompts and short-time audio prompts to extract multi-modal features, which are aligned at the frame level and utilized in subsequent fine-grained feature matching and fusion. Due to the natural alignment of visual content and audio signal in real data, we propose a frame-level cross-modal feature matching loss function to learn the fine-grained audio-visual consistency. Comprehensive experiments demonstrate the effectiveness and superior generalization ability of our method against the state-of-the-art methods.

Cite

Text

Miao et al. "Multi-Modal Deepfake Detection via Multi-Task Audio-Visual Prompt Learning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I1.32042

Markdown

[Miao et al. "Multi-Modal Deepfake Detection via Multi-Task Audio-Visual Prompt Learning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/miao2025aaai-multi/) doi:10.1609/AAAI.V39I1.32042

BibTeX

@inproceedings{miao2025aaai-multi,
  title     = {{Multi-Modal Deepfake Detection via Multi-Task Audio-Visual Prompt Learning}},
  author    = {Miao, Hui and Guo, Yuanfang and Liu, Zeming and Wang, Yunhong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {612-621},
  doi       = {10.1609/AAAI.V39I1.32042},
  url       = {https://mlanthology.org/aaai/2025/miao2025aaai-multi/}
}