Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection

Abstract

Weakly supervised video anomaly detection (WSVAD) is a challenging task. Generating fine-grained pseudo-labels based on weak-label and then self-training a classifier is currently a promising solution. However since the existing methods use only RGB visual modality and the utilization of category text information is neglected thus limiting the generation of more accurate pseudo-labels and affecting the performance of self-training. Inspired by the manual labeling process based on the event description in this paper we propose a novel pseudo-label generation and self-training framework based on Text Prompt with Normality Guidance (TPWNG) for WSVAD. Our idea is to transfer the rich language-visual knowledge of the contrastive language-image pre-training (CLIP) model for aligning the video event description text and corresponding video frames to generate pseudo-labels. Specifically We first fine-tune the CLIP for domain adaptation by designing two ranking losses and a distributional inconsistency loss. Further we propose a learnable text prompt mechanism with the assist of a normality visual prompt to further improve the matching accuracy of video event description text and video frames. Then we design a pseudo-label generation module based on the normality guidance to infer reliable frame-level pseudo-labels. Finally we introduce a temporal context self-adaptive learning module to learn the temporal dependencies of different video events more flexibly and accurately. Extensive experiments show that our method achieves state-of-the-art performance on two benchmark datasets UCF-Crime and XD-Violence demonstrating the effectiveness of our proposed method.

Cite

Text

Yang et al. "Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01788

Markdown

[Yang et al. "Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/yang2024cvpr-text/) doi:10.1109/CVPR52733.2024.01788

BibTeX

@inproceedings{yang2024cvpr-text,
  title     = {{Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection}},
  author    = {Yang, Zhiwei and Liu, Jing and Wu, Peng},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {18899-18908},
  doi       = {10.1109/CVPR52733.2024.01788},
  url       = {https://mlanthology.org/cvpr/2024/yang2024cvpr-text/}
}