Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection

Abstract

In frame-based vision, object detection faces substantial performance degradation under challenging conditions due to the limited sensing capability of conventional cameras. Event cameras output sparse and asynchronous events, providing a potential solution to solve these problems. However, effectively fusing two heterogeneous modalities remains an open issue. In this work, we propose a novel hierarchical feature refinement network for event-frame fusion. The core concept is the design of the coarse-to-fine fusion module, denoted as the cross-modality adaptive feature refinement (CAFR) module. In the initial phase, the bidirectional cross-modality interaction (BCI) part facilitates information bridging from two distinct sources. Subsequently, the features are further refined by aligning the channel-level mean and variance in the two-fold adaptive feature refinement (TAFR) part. We conducted extensive experiments on two benchmarks: the low-resolution PKU-DDD17-Car dataset and the high-resolution DSEC dataset. Experimental results show that our method surpasses the state-of-the-art by an impressive margin of 8.0% on the DSEC dataset. Besides, our method exhibits significantly better robustness (69.5% versus 38.7%) when introducing 15 different corruption types to the frame images. The code can be found at the link (https://github.com/HuCaoFighting/FRN).

Cite

Text

Cao et al. "Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72907-2_10

Markdown

[Cao et al. "Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/cao2024eccv-embracing/) doi:10.1007/978-3-031-72907-2_10

BibTeX

@inproceedings{cao2024eccv-embracing,
  title     = {{Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection}},
  author    = {Cao, Hu and Zhang, Zehua and Xia, Yan and Li, Xinyi and Xia, Jiahao and Chen, Guang and Knoll, Alois C.},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72907-2_10},
  url       = {https://mlanthology.org/eccv/2024/cao2024eccv-embracing/}
}