Embedding Task Structure for Action Detection

Abstract

We present a straightforward, flexible method to enhance the accuracy and quality of action detection by expressing temporal and structural relationships of actions in the loss function of a deep network. We describe ways to represent otherwise implicit structure in video data and demonstrate how these structures reflect natural biases that improve network training. Our experiments show that our approach improves both accuracy and edit-distance of action recognition and detection models over a baseline. Our framework leads to improvements over prior work and obtains state-of-the-art results on multiple benchmarks.

Cite

Text

Peven and Hager. "Embedding Task Structure for Action Detection." Winter Conference on Applications of Computer Vision, 2024.

Markdown

[Peven and Hager. "Embedding Task Structure for Action Detection." Winter Conference on Applications of Computer Vision, 2024.](https://mlanthology.org/wacv/2024/peven2024wacv-embedding/)

BibTeX

@inproceedings{peven2024wacv-embedding,
  title     = {{Embedding Task Structure for Action Detection}},
  author    = {Peven, Michael and Hager, Gregory D.},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2024},
  pages     = {6604-6613},
  url       = {https://mlanthology.org/wacv/2024/peven2024wacv-embedding/}
}