Data Association Between Event Streams and Intensity Frames Under Diverse Baselines
Abstract
This paper proposes a learning-based framework to associate event streams and intensity frames under diverse camera baselines, to simultaneously benefit to camera pose estimation under large baseline and depth estimation under small baseline. Based on the observation that event streams are globally sparse (a small percentage of pixels in global frames are triggered with events) and locally dense (a large percentage of pixels in local patches are triggered with events) in the spatial domain, we put forward a two-stage architecture for matching feature maps. LSparse-Net uses a large receptive field to find sparse matches while SDense-Net uses a small receptive field to find dense matches. Both two stages apply transformer modules with self-attention layers and cross-attention layers to effectively process multi-resolution features from the feature pyramid network backbone. Experimental results on public datasets show systematic performance improvement for both tasks compared to state-of-the-art methods.
Cite
Text
Zhang et al. "Data Association Between Event Streams and Intensity Frames Under Diverse Baselines." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-20071-7_5Markdown
[Zhang et al. "Data Association Between Event Streams and Intensity Frames Under Diverse Baselines." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/zhang2022eccv-data/) doi:10.1007/978-3-031-20071-7_5BibTeX
@inproceedings{zhang2022eccv-data,
title = {{Data Association Between Event Streams and Intensity Frames Under Diverse Baselines}},
author = {Zhang, Dehao and Ding, Qiankun and Duan, Peiqi and Zhou, Chu and Shi, Boxin},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2022},
doi = {10.1007/978-3-031-20071-7_5},
url = {https://mlanthology.org/eccv/2022/zhang2022eccv-data/}
}