OE-CTST: Outlier-Embedded Cross Temporal Scale Transformer for Weakly-Supervised Video Anomaly Detection

Abstract

Video anomaly detection in real-world scenarios is challenging due to the complex temporal blending of long and short-length anomalies with normal ones. Further, it is more difficult to detect those due to : (i) Distinctive features characterizing the short and long anomalies with sharp and progressive temporal cues respectively; (ii) Lack of precise temporal information (i.e. weak-supervision) limits the temporal dynamics modeling of anomalies from normal events. In this paper, we propose a novel 'temporal transformer' framework for weakly-supervised anomaly detection: OE-CTST. The proposed framework has two major components: (i) Outlier Embedder (OE) and (ii) Cross Temporal Scale Transformer (CTST). First, OE generates anomaly-aware temporal position encoding to allow the transformer to effectively model the temporal dynamics among the anomalies and normal events. Second, CTST encodes the cross-correlation between multi-temporal scale features to benefit short and long length anomalies by modeling the global temporal relations. The proposed OE-CTST is validated on three publicly available datasets i.e. UCF-Crime, XD-Violence, and IITB-Corridor, outperforming recently reported state-of-the-art approaches.

Cite

Text

Majhi et al. "OE-CTST: Outlier-Embedded Cross Temporal Scale Transformer for Weakly-Supervised Video Anomaly Detection." Winter Conference on Applications of Computer Vision, 2024.

Markdown

[Majhi et al. "OE-CTST: Outlier-Embedded Cross Temporal Scale Transformer for Weakly-Supervised Video Anomaly Detection." Winter Conference on Applications of Computer Vision, 2024.](https://mlanthology.org/wacv/2024/majhi2024wacv-oectst/)

BibTeX

@inproceedings{majhi2024wacv-oectst,
  title     = {{OE-CTST: Outlier-Embedded Cross Temporal Scale Transformer for Weakly-Supervised Video Anomaly Detection}},
  author    = {Majhi, Snehashis and Dai, Rui and Kong, Quan and Garattoni, Lorenzo and Francesca, Gianpiero and Brémond, François},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2024},
  pages     = {8574-8583},
  url       = {https://mlanthology.org/wacv/2024/majhi2024wacv-oectst/}
}