ST2ST: Self-Supervised Test-Time Adaptation for Video Action Recognition

Fahim, Masud An Nur Islam; Innat, Mohammed; Boutellier, Jani

doi:10.1109/CVPRW63382.2024.00112

ST2ST: Self-Supervised Test-Time Adaptation for Video Action Recognition

Masud An Nur Islam Fahim, Mohammed Innat, Jani Boutellier

CVPRW 2024 pp. 1057-1066

doi:10.1109/CVPRW63382.2024.00112 /cvprw/2024/fahim2024cvprw-st2st/

Abstract

The performance of trained deep neural network (DNN) models relies on the assumption that the test data has largely the same feature distribution as the training data. In deployed video recognition systems, the feature distribution of acquired samples can however become shifted due to environmental conditions (rain, lighting variations) or technological factors such as lossy data compression. To improve action recognition performance under feature distribution shifts, we propose a simple test-time self-distillation strategy where the DNN model goes through an intra-video logit minimization phase. As a result, the model can update its predictions for the given input. The proposed approach is agnostic to the neural network type (CNN, transformer) and applies to various action recognition models. In contrast to many test-time adaption studies, the proposed approach does not require access to the training data. The performance of the proposed method is evaluated with multiple state-of-the-art action recognition models and widely used benchmark datasets Kinetics-400 and Something-Something V2.

CVPRW Semantic Scholar

Cite

Text

Fahim et al. "ST2ST: Self-Supervised Test-Time Adaptation for Video Action Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00112

Markdown

[Fahim et al. "ST2ST: Self-Supervised Test-Time Adaptation for Video Action Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/fahim2024cvprw-st2st/) doi:10.1109/CVPRW63382.2024.00112

BibTeX

@inproceedings{fahim2024cvprw-st2st,
  title     = {{ST2ST: Self-Supervised Test-Time Adaptation for Video Action Recognition}},
  author    = {Fahim, Masud An Nur Islam and Innat, Mohammed and Boutellier, Jani},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {1057-1066},
  doi       = {10.1109/CVPRW63382.2024.00112},
  url       = {https://mlanthology.org/cvprw/2024/fahim2024cvprw-st2st/}
}