Federated Weakly Supervised Video Anomaly Detection with Multimodal Prompt

Abstract

Video anomaly detection (VAD) aims at locating the abnormal events in videos. Recently, the Weakly Supervised VAD has made great progress, which only requires video-level annotations when training. In practical applications, different institutions may have different types of abnormal videos. However, the abnormal videos cannot be circulated on the internet due to privacy protection. To train a more generalized anomaly detector that can identify various anomalies, it is reasonable to introduce federated learning into WSVAD. In this paper, we propose Global and Local Context-driven Federated Learning, a new paradigm for privacy protected weakly supervised video anomaly detection. Specifically, we utilize the vision-language association of CLIP to detect whether the video frame is abnormal. Instead of leveraging handcrafted text prompts for CLIP, we propose a text prompt generator. The generated prompt is simultaneously influenced by text and visual. On the one hand, the text provides global context related to anomaly, which improves the model's ability of generalization. On the other hand, the visual provides personalized local context because different clients may have videos with different types of anomalies or scenes. The generated prompt ensures global generalization while processing personalized data from different clients. Extensive experiments show that the proposed method achieves remarkable performance.

Cite

Text

Wang et al. "Federated Weakly Supervised Video Anomaly Detection with Multimodal Prompt." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I20.35398

Markdown

[Wang et al. "Federated Weakly Supervised Video Anomaly Detection with Multimodal Prompt." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/wang2025aaai-federated/) doi:10.1609/AAAI.V39I20.35398

BibTeX

@inproceedings{wang2025aaai-federated,
  title     = {{Federated Weakly Supervised Video Anomaly Detection with Multimodal Prompt}},
  author    = {Wang, Benfeng and Huang, Chao and Wen, Jie and Wang, Wei and Liu, Yabo and Xu, Yong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {21017-21025},
  doi       = {10.1609/AAAI.V39I20.35398},
  url       = {https://mlanthology.org/aaai/2025/wang2025aaai-federated/}
}