Learning to Predict Activity Progress by Self-Supervised Video Alignment

CVPR 2024 pp. 18667-18677

doi:10.1109/CVPR52733.2024.01766 /cvpr/2024/donahue2024cvpr-learning/

Abstract

In this paper we tackle the problem of self-supervised video alignment and activity progress prediction using in-the-wild videos. Our proposed self-supervised representation learning method carefully addresses different action orderings redundant actions and background frames to generate improved video representations compared to previous methods. Our model generalizes temporal cycle-consistency learning to allow for more flexibility in determining cycle-consistent neighbors. More specifically to handle repeated actions we propose a multi-neighbor cycle consistency and a multi-cycle-back regression loss by finding multiple soft nearest neighbors using a Gaussian Mixture Model. To handle background and redundant frames we introduce a context-dependent drop function in our framework discouraging the alignment of droppable frames. On the other hand to learn from videos of multiple activities jointly we propose a multi-head crosstask network allowing us to embed a video and estimate progress without knowing its activity label. Experiments on multiple datasets show that our method outperforms the state-of-the-art for video alignment and progress prediction.

PDF CVPR Semantic Scholar

Cite

Text

Donahue and Elhamifar. "Learning to Predict Activity Progress by Self-Supervised Video Alignment." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01766

Markdown

[Donahue and Elhamifar. "Learning to Predict Activity Progress by Self-Supervised Video Alignment." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/donahue2024cvpr-learning/) doi:10.1109/CVPR52733.2024.01766

BibTeX

@inproceedings{donahue2024cvpr-learning,
  title     = {{Learning to Predict Activity Progress by Self-Supervised Video Alignment}},
  author    = {Donahue, Gerard and Elhamifar, Ehsan},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {18667-18677},
  doi       = {10.1109/CVPR52733.2024.01766},
  url       = {https://mlanthology.org/cvpr/2024/donahue2024cvpr-learning/}
}