EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding

Abstract

Recent advances in egocentric video understanding models are promising, but their heavy computational expense is a barrier for many real-world applications. To address this challenge, we propose EgoDistill, a distillation-based approach that learns to reconstruct heavy ego-centric video clip features by combining the semantics from a sparse set of video frames with head motion from lightweight IMU readings. We further devise a novel IMU-based self-supervised pretraining strategy. Our method leads to significant improvements in efficiency, requiring 200× fewer GFLOPs than equivalent video models. We demonstrate its effectiveness on the Ego4D and EPIC- Kitchens datasets, where our method outperforms state-of-the-art efficient video understanding methods.

Cite

Text

Tan et al. "EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding." Neural Information Processing Systems, 2023.

Markdown

[Tan et al. "EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/tan2023neurips-egodistill/)

BibTeX

@inproceedings{tan2023neurips-egodistill,
  title     = {{EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding}},
  author    = {Tan, Shuhan and Nagarajan, Tushar and Grauman, Kristen},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/tan2023neurips-egodistill/}
}