Ex-VAD: Explainable Fine-Grained Video Anomaly Detection Based on Visual-Language Models

Chao Huang, Yushu Shi, Jie Wen, Wei Wang, Yong Xu, Xiaochun Cao

ICML 2025 pp. 25750-25761

/icml/2025/huang2025icml-exvad/

Abstract

With advancements in visual language models (VLMs) and large language models (LLMs), video anomaly detection (VAD) has progressed beyond binary classification to fine-grained categorization and multidimensional analysis. However, existing methods focus mainly on coarse-grained detection, lacking anomaly explanations. To address these challenges, we propose Ex-VAD, an Explainable Fine-grained Video Anomaly Detection approach that combines fine-grained classification with detailed explanations of anomalies. First, we use a VLM to extract frame-level captions, and an LLM converts them to video-level explanations, enhancing the model’s explainability. Second, integrating textual explanations of anomalies with visual information greatly enhances the model’s anomaly detection capability. Finally, we apply label-enhanced alignment to optimize feature fusion, enabling precise fine-grained detection. Extensive experimental results on the UCF-Crime and XD-Violence datasets demonstrate that Ex-VAD significantly outperforms existing State-of-The-Art methods.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Huang et al. "Ex-VAD: Explainable Fine-Grained Video Anomaly Detection Based on Visual-Language Models." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Huang et al. "Ex-VAD: Explainable Fine-Grained Video Anomaly Detection Based on Visual-Language Models." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/huang2025icml-exvad/)

BibTeX

@inproceedings{huang2025icml-exvad,
  title     = {{Ex-VAD: Explainable Fine-Grained Video Anomaly Detection Based on Visual-Language Models}},
  author    = {Huang, Chao and Shi, Yushu and Wen, Jie and Wang, Wei and Xu, Yong and Cao, Xiaochun},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {25750-25761},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/huang2025icml-exvad/}
}