Fair Deepfake Detectors Can Generalize

Abstract

Deepfake detection models face two critical challenges: generalization to unseen manipulations and demographic fairness among population groups. However, existing approaches often demonstrate that these two objectives are inherently conflicting, revealing a trade-off between them. In this paper, we, for the first time, uncover and formally define a causal relationship between fairness and generalization. Building on the back-door adjustment, we show that controlling for confounders (data distribution and model capacity) enables improved generalization via fairness interventions. Motivated by this insight, we propose Demographic Attribute-insensitive Intervention Detection (DAID), a plug-and-play framework composed of: i) Demographic-aware data rebalancing, which employs inverse-propensity weighting and subgroup-wise feature normalization to neutralize distributional biases; and ii) Demographic-agnostic feature aggregation, which uses a novel alignment loss to suppress sensitive-attribute signals. Across three cross-domain benchmarks, DAID consistently achieves superior performance in both fairness and generalization compared to several state-of-the-art detectors, validating both its theoretical foundation and practical effectiveness.

Cite

Text

Cheng et al. "Fair Deepfake Detectors Can Generalize." Advances in Neural Information Processing Systems, 2025.

Markdown

[Cheng et al. "Fair Deepfake Detectors Can Generalize." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/cheng2025neurips-fair/)

BibTeX

@inproceedings{cheng2025neurips-fair,
  title     = {{Fair Deepfake Detectors Can Generalize}},
  author    = {Cheng, Harry and Liu, Ming-Hui and Guo, Yangyang and Wang, Tianyi and Nie, Liqiang and Kankanhalli, Mohan},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/cheng2025neurips-fair/}
}