Delving into Sequential Patches for Deepfake Detection

Abstract

Recent advances in face forgery techniques produce nearly visually untraceable deepfake videos, which could be leveraged with malicious intentions. As a result, researchers have been devoted to deepfake detection. Previous studies have identified the importance of local low-level cues and temporal information in pursuit to generalize well across deepfake methods, however, they still suffer from robustness problem against post-processings. In this work, we propose the Local- & Temporal-aware Transformer-based Deepfake Detection (LTTD) framework, which adopts a local-to-global learning protocol with a particular focus on the valuable temporal information within local sequences. Specifically, we propose a Local Sequence Transformer (LST), which models the temporal consistency on sequences of restricted spatial regions, where low-level information is hierarchically enhanced with shallow layers of learned 3D filters. Based on the local temporal embeddings, we then achieve the final classification in a global contrastive way. Extensive experiments on popular datasets validate that our approach effectively spots local forgery cues and achieves state-of-the-art performance.

Cite

Text

Guan et al. "Delving into Sequential Patches for Deepfake Detection." Neural Information Processing Systems, 2022.

Markdown

[Guan et al. "Delving into Sequential Patches for Deepfake Detection." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/guan2022neurips-delving/)

BibTeX

@inproceedings{guan2022neurips-delving,
  title     = {{Delving into Sequential Patches for Deepfake Detection}},
  author    = {Guan, Jiazhi and Zhou, Hang and Hong, Zhibin and Ding, Errui and Wang, Jingdong and Quan, Chengbin and Zhao, Youjian},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/guan2022neurips-delving/}
}