VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL

Park, Kyoungjun; Yang, Yifan; Yi, Juheon; Zheng, Shicheng; Muaz, Muhammad; Shen, Yifei; Han, Dongqi; Shan, Caihua; Qiu, Lili

VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL

Kyoungjun Park, Yifan Yang, Juheon Yi, Shicheng Zheng, Muhammad Muaz, Yifei Shen, Dongqi Han, Caihua Shan, Lili Qiu

ICLR 2026

/iclr/2026/park2026iclr-vidguardr1/

Abstract

The rapid proliferation of AI-generated video necessitates robust detection tools that offer both high accuracy and human-interpretable explanations. While existing MLLM-based detectors rely on supervised fine-tuning (SFT) or direct preference optimization (DPO), these methods are often bottlenecked by static, pre-labeled datasets that fail to capture the evolving, multi-step physical inconsistencies of modern generative models. To bridge this gap, we introduce VidGuard-R1, the first video authenticity detector to utilize group relative policy optimization (GRPO). Moving beyond passive preference matching, VidGuard-R1 employs a reinforcement learning framework that encourages the model to explore and rank multiple reasoning paths. By introducing specialized reward models for temporal stability and diffusion-aware complexity, we incentivize the model to discover 'physics-grounded' artifacts. Our contributions include: (1) a curated dataset of 140,000 challenging real/fake video pairs; (2) a GRPO-based training paradigm that achieves state-of-the-art zero-shot performance; and (3) a reasoning-first architecture that provides precise, verifiable rationales for its forensic judgments. Project website: https://vidguard-r1.github.io

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Park et al. "VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL." International Conference on Learning Representations, 2026.

Markdown

[Park et al. "VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/park2026iclr-vidguardr1/)

BibTeX

@inproceedings{park2026iclr-vidguardr1,
  title     = {{VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL}},
  author    = {Park, Kyoungjun and Yang, Yifan and Yi, Juheon and Zheng, Shicheng and Muaz, Muhammad and Shen, Yifei and Han, Dongqi and Shan, Caihua and Qiu, Lili},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/park2026iclr-vidguardr1/}
}