Efficient RGB-T Tracking via Cross-Modality Distillation

Abstract

Most current RGB-T trackers adopt a two-stream structure to extract unimodal RGB and thermal features and complex fusion strategies to achieve multi-modal feature fusion, which require a huge number of parameters, thus hindering their real-life applications. On the other hand, a compact RGB-T tracker may be computationally efficient but encounter non-negligible performance degradation, due to the weakening of feature representation ability. To remedy this situation, a cross-modality distillation framework is presented to bridge the performance gap between a compact tracker and a powerful tracker. Specifically, a specific-common feature distillation module is proposed to transform the modality-common information as well as the modality-specific information from a deeper two-stream network to a shallower single-stream network. In addition, a multi-path selection distillation module is proposed to instruct a simple fusion module to learn more accurate multi-modal information from a well-designed fusion mechanism by using multiple paths. We validate the effectiveness of our method with extensive experiments on three RGB-T benchmarks, which achieves state-of-the-art performance but consumes much less computational resources.

Cite

Text

Zhang et al. "Efficient RGB-T Tracking via Cross-Modality Distillation." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00523

Markdown

[Zhang et al. "Efficient RGB-T Tracking via Cross-Modality Distillation." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/zhang2023cvpr-efficient-a/) doi:10.1109/CVPR52729.2023.00523

BibTeX

@inproceedings{zhang2023cvpr-efficient-a,
  title     = {{Efficient RGB-T Tracking via Cross-Modality Distillation}},
  author    = {Zhang, Tianlu and Guo, Hongyuan and Jiao, Qiang and Zhang, Qiang and Han, Jungong},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {5404-5413},
  doi       = {10.1109/CVPR52729.2023.00523},
  url       = {https://mlanthology.org/cvpr/2023/zhang2023cvpr-efficient-a/}
}