Efficient Temporal Action Segmentation via Boundary-Aware Query Voting

Abstract

Although the performance of Temporal Action Segmentation (TAS) has been improved in recent years, achieving promising results often comes with a high computational cost due to dense inputs, complex model structures, and resource-intensive post-processing requirements. To improve the efficiency while keeping the high performance, we present a novel perspective centered on per-segment classification. By harnessing the capabilities of Transformers, we tokenize each video segment as an instance token, endowed with intrinsic instance segmentation. To realize efficient action segmentation, we introduce BaFormer, a boundary-aware Transformer network. It employs instance queries for instance segmentation and a global query for class-agnostic boundary prediction, yielding continuous segment proposals. During inference, BaFormer employs a simple yet effective voting strategy to classify boundary-wise segments based on instance segmentation. Remarkably, as a single-stage approach, BaFormer significantly reduces the computational costs, utilizing only 6% of the running time compared to the state-of-the-art method DiffAct, while producing better or comparable accuracy over several popular benchmarks. The code for this project is publicly available at https://github.com/peiyao-w/BaFormer.

Cite

Text

Wang et al. "Efficient Temporal Action Segmentation via Boundary-Aware Query Voting." Neural Information Processing Systems, 2024. doi:10.52202/079017-1192

Markdown

[Wang et al. "Efficient Temporal Action Segmentation via Boundary-Aware Query Voting." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/wang2024neurips-efficient/) doi:10.52202/079017-1192

BibTeX

@inproceedings{wang2024neurips-efficient,
  title     = {{Efficient Temporal Action Segmentation via Boundary-Aware Query Voting}},
  author    = {Wang, Peiyao and Lin, Yuewei and Blasch, Erik and Wei, Jie and Ling, Haibin},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1192},
  url       = {https://mlanthology.org/neurips/2024/wang2024neurips-efficient/}
}