Audio-Visual Person-of-Interest DeepFake Detection

Cozzolino, Davide; Pianese, Alessandro; Nießner, Matthias; Verdoliva, Luisa

doi:10.1109/CVPRW59228.2023.00101

Audio-Visual Person-of-Interest DeepFake Detection

Davide Cozzolino, Alessandro Pianese, Matthias Nießner, Luisa Verdoliva

CVPRW 2023 pp. 943-952

doi:10.1109/CVPRW59228.2023.00101 /cvprw/2023/cozzolino2023cvprw-audiovisual/

Abstract

Face manipulation technology is advancing very rapidly, and new methods are being proposed day by day. The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world. Our key insight is that each person has specific characteristics that a synthetic generator likely cannot reproduce. Accordingly, we extract audio-visual features which characterize the identity of a person, and use them to create a person-of-interest (POI) deepfake detector. We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity. As a result, when the video and/or audio of a person is manipulated, its representation in the embedding space becomes inconsistent with the real identity, allowing reliable detection. Training is carried out exclusively on real talking-face video; thus, the detector does not depend on any specific manipulation method and yields the highest generalization ability. In addition, our method can detect both single-modality (audio-only, video-only) and multimodality (audio-video) attacks, and is robust to low-quality or corrupted videos. Experiments on a wide variety of datasets confirm that our method ensures a SOTA performance, especially on low quality videos. Code is publicly available on-line at https://github.com/grip-unina/poi-forensics.

PDF CVPRW Semantic Scholar

Cite

Text

Cozzolino et al. "Audio-Visual Person-of-Interest DeepFake Detection." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00101

Markdown

[Cozzolino et al. "Audio-Visual Person-of-Interest DeepFake Detection." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/cozzolino2023cvprw-audiovisual/) doi:10.1109/CVPRW59228.2023.00101

BibTeX

@inproceedings{cozzolino2023cvprw-audiovisual,
  title     = {{Audio-Visual Person-of-Interest DeepFake Detection}},
  author    = {Cozzolino, Davide and Pianese, Alessandro and Nießner, Matthias and Verdoliva, Luisa},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {943-952},
  doi       = {10.1109/CVPRW59228.2023.00101},
  url       = {https://mlanthology.org/cvprw/2023/cozzolino2023cvprw-audiovisual/}
}