IndicFake Meets SAFARI-LLM: Unifying Semantic and Acoustic Intelligence for Multilingual Deepfake Detection

Abstract

Audio deepfakes pose a growing threat, particularly in linguistically diverse and low-resource settings where existing detection methods often struggle. This work introduces two transformative contributions to address these challenges. First, we present \textbf{IndicFake}, a pioneering audio deepfake dataset with over 4.2 million samples (7,350 hours) spanning English and 17 Indian languages across Indo-European, Dravidian, and Sino-Tibetan families. With minimal overlap (Jaccard similarity: 0.00--0.06) with existing datasets, IndicFake offers an unparalleled benchmark for multilingual deepfake detection. Second, we propose \textbf{SAFARI-LLM} (Semantic Acoustic Feature Adaptive Router with Integrated LLM), a novel framework that integrates Whisper’s semantic embeddings and m-HuBERT’s acoustic features through an adaptive Audio Feature Unification Module (AFUM). Enhanced by LoRA-fine-tuned LLaMA-7B, SAFARI-LLM achieves unmatched cross-lingual and cross-family generalization. Evaluations across IndicFake, DECRO, and WaveFake datasets demonstrate its superiority, outperforming 14 state-of-the-art models with standout accuracies of 94.21\% (English-to-Japanese transfer on WaveFake) and 84.48\% (English-to-Chinese transfer on DECRO), alongside robust performance across diverse linguistic contexts. These advancements establish a new standard for reliable, scalable audio deepfake detection. Code and resources are publicly available at: https://anonymousillusion.github.io/indicfake/.

Cite

Text

Ranjan et al. "IndicFake Meets SAFARI-LLM: Unifying Semantic and Acoustic Intelligence for Multilingual Deepfake Detection." Transactions on Machine Learning Research, 2025.

Markdown

[Ranjan et al. "IndicFake Meets SAFARI-LLM: Unifying Semantic and Acoustic Intelligence for Multilingual Deepfake Detection." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/ranjan2025tmlr-indicfake/)

BibTeX

@article{ranjan2025tmlr-indicfake,
  title     = {{IndicFake Meets SAFARI-LLM: Unifying Semantic and Acoustic Intelligence for Multilingual Deepfake Detection}},
  author    = {Ranjan, Rishabh and Vatsa, Mayank and Singh, Richa},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/ranjan2025tmlr-indicfake/}
}