Attention Enhanced Single Stage Multimodal Reasoner

Ou, Jie; Zhang, Xinying

doi:10.1007/978-3-030-66096-3_5

Attention Enhanced Single Stage Multimodal Reasoner

Jie Ou, Xinying Zhang

ECCVW 2020 pp. 51-61

doi:10.1007/978-3-030-66096-3_5 /eccvw/2020/ou2020eccvw-attention/

Abstract

In this paper, we propose an Attention Enhanced Single Stage Multimodal Reasoner (ASSMR) to tackle the object referral task in the self-driving car scenario. We extract features from each modality and establish attention mechanisms to jointly process them. The Key Words Extractor (KWE) is used to extract the attribute and position/scale information of the target in the command, which are used to score the corresponding features through the Position/Scale Attention Module (P/SAM) and the Object Attention Module (OAM). Based on the attention mechanism, the effective part of the position/scale feature, the object attribute feature and the semantic feature of the command is enhanced. Finally, we map different features to a common embedding space to predict the final result. Our method is based on the simplified version of the Talk2Car dataset, and scored on 66.4 AP50 on the test set, while using the official region proposals.

PDF ECCVW Semantic Scholar

Cite

Text

Ou and Zhang. "Attention Enhanced Single Stage Multimodal Reasoner." European Conference on Computer Vision Workshops, 2020. doi:10.1007/978-3-030-66096-3_5

Markdown

[Ou and Zhang. "Attention Enhanced Single Stage Multimodal Reasoner." European Conference on Computer Vision Workshops, 2020.](https://mlanthology.org/eccvw/2020/ou2020eccvw-attention/) doi:10.1007/978-3-030-66096-3_5

BibTeX

@inproceedings{ou2020eccvw-attention,
  title     = {{Attention Enhanced Single Stage Multimodal Reasoner}},
  author    = {Ou, Jie and Zhang, Xinying},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2020},
  pages     = {51-61},
  doi       = {10.1007/978-3-030-66096-3_5},
  url       = {https://mlanthology.org/eccvw/2020/ou2020eccvw-attention/}
}