Scene Adaptive Sparse Transformer for Event-Based Object Detection

Abstract

While recent Transformer-based approaches have shown impressive performances on event-based object detection tasks their high computational costs still diminish the low power consumption advantage of event cameras. Image-based works attempt to reduce these costs by introducing sparse Transformers. However they display inadequate sparsity and adaptability when applied to event-based object detection since these approaches cannot balance the fine granularity of token-level sparsification and the efficiency of window-based Transformers leading to reduced performance and efficiency. Furthermore they lack scene-specific sparsity optimization resulting in information loss and a lower recall rate. To overcome these limitations we propose the Scene Adaptive Sparse Transformer (SAST). SAST enables window-token co-sparsification significantly enhancing fault tolerance and reducing computational overhead. Leveraging the innovative scoring and selection modules along with the Masked Sparse Window Self-Attention SAST showcases remarkable scene-aware adaptability: It focuses only on important objects and dynamically optimizes sparsity level according to scene complexity maintaining a remarkable balance between performance and computational cost. The evaluation results show that SAST outperforms all other dense and sparse networks in both performance and efficiency on two large-scale event-based object detection datasets (1Mpx and Gen1). Code: https://github.com/Peterande/SAST

Cite

Text

Peng et al. "Scene Adaptive Sparse Transformer for Event-Based Object Detection." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01589

Markdown

[Peng et al. "Scene Adaptive Sparse Transformer for Event-Based Object Detection." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/peng2024cvpr-scene/) doi:10.1109/CVPR52733.2024.01589

BibTeX

@inproceedings{peng2024cvpr-scene,
  title     = {{Scene Adaptive Sparse Transformer for Event-Based Object Detection}},
  author    = {Peng, Yansong and Li, Hebei and Zhang, Yueyi and Sun, Xiaoyan and Wu, Feng},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {16794-16804},
  doi       = {10.1109/CVPR52733.2024.01589},
  url       = {https://mlanthology.org/cvpr/2024/peng2024cvpr-scene/}
}