Attention in Multimodal Neural Networks for Person Re-Identification

Abstract

In spite of increasing interest from the research community, person re-identification remains an unsolved problem. Correctly deciding on a true match by comparing images of a person, captured by several cameras, requires extraction of discriminative features to counter challenges such as changes in lighting, viewpoint and occlusion. Besides devising novel feature descriptors, the setup can be changed to capture persons from an overhead viewpoint rather than a horizontal. Furthermore, additional modalities can be considered that are not affected by similar environmental changes as RGB images. In this work, we present a Multimodal ATtention network (MAT) based on RGB and depth modalities. We combine a Convolution Neural Network with an attention module to extract local and discriminative features that are fused with globally extracted features. Attention is based on correlation between the two modalities and we finally also fuse RGB and depth features to generate a joint multilevel RGB-D feature. Experiments conducted on three datasets captured from an overhead view show the importance of attention, increasing accuracies by 3.43%, 2.01% and 2.13% on OPR, DPI-T and TVPR, respectively.

Cite

Text

Lejbølle et al. "Attention in Multimodal Neural Networks for Person Re-Identification." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018. doi:10.1109/CVPRW.2018.00055

Markdown

[Lejbølle et al. "Attention in Multimodal Neural Networks for Person Re-Identification." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018.](https://mlanthology.org/cvprw/2018/lejblle2018cvprw-attention/) doi:10.1109/CVPRW.2018.00055

BibTeX

@inproceedings{lejblle2018cvprw-attention,
  title     = {{Attention in Multimodal Neural Networks for Person Re-Identification}},
  author    = {Lejbølle, Aske R. and Krogh, Benjamin and Nasrollahi, Kamal and Moeslund, Thomas B.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2018},
  pages     = {179-187},
  doi       = {10.1109/CVPRW.2018.00055},
  url       = {https://mlanthology.org/cvprw/2018/lejblle2018cvprw-attention/}
}