Attention Enhanced Single Stage Multimodal Reasoner
Abstract
In this paper, we propose an Attention Enhanced Single Stage Multimodal Reasoner (ASSMR) to tackle the object referral task in the self-driving car scenario. We extract features from each modality and establish attention mechanisms to jointly process them. The Key Words Extractor (KWE) is used to extract the attribute and position/scale information of the target in the command, which are used to score the corresponding features through the Position/Scale Attention Module (P/SAM) and the Object Attention Module (OAM). Based on the attention mechanism, the effective part of the position/scale feature, the object attribute feature and the semantic feature of the command is enhanced. Finally, we map different features to a common embedding space to predict the final result. Our method is based on the simplified version of the Talk2Car dataset, and scored on 66.4 AP50 on the test set, while using the official region proposals.
Cite
Text
Ou and Zhang. "Attention Enhanced Single Stage Multimodal Reasoner." European Conference on Computer Vision Workshops, 2020. doi:10.1007/978-3-030-66096-3_5Markdown
[Ou and Zhang. "Attention Enhanced Single Stage Multimodal Reasoner." European Conference on Computer Vision Workshops, 2020.](https://mlanthology.org/eccvw/2020/ou2020eccvw-attention/) doi:10.1007/978-3-030-66096-3_5BibTeX
@inproceedings{ou2020eccvw-attention,
title = {{Attention Enhanced Single Stage Multimodal Reasoner}},
author = {Ou, Jie and Zhang, Xinying},
booktitle = {European Conference on Computer Vision Workshops},
year = {2020},
pages = {51-61},
doi = {10.1007/978-3-030-66096-3_5},
url = {https://mlanthology.org/eccvw/2020/ou2020eccvw-attention/}
}