Commands for Autonomous Vehicles by Progressively Stacking Visual-Linguistic Representations
Abstract
In this work, we focus on the object referral problem in the autonomous driving setting. We use a stacked visual-linguistic BERT model to learn a generic visual-linguistic representation. Each element of the input is either a word or a region of interest from the input image. To train the deep model efficiently, we use a stacking algorithm to transfer knowledge from a shallow BERT model to a deep BERT model.
Cite
Text
Dai et al. "Commands for Autonomous Vehicles by Progressively Stacking Visual-Linguistic Representations." European Conference on Computer Vision Workshops, 2020. doi:10.1007/978-3-030-66096-3_2Markdown
[Dai et al. "Commands for Autonomous Vehicles by Progressively Stacking Visual-Linguistic Representations." European Conference on Computer Vision Workshops, 2020.](https://mlanthology.org/eccvw/2020/dai2020eccvw-commands/) doi:10.1007/978-3-030-66096-3_2BibTeX
@inproceedings{dai2020eccvw-commands,
title = {{Commands for Autonomous Vehicles by Progressively Stacking Visual-Linguistic Representations}},
author = {Dai, Hang and Luo, Shujie and Ding, Yong and Shao, Ling},
booktitle = {European Conference on Computer Vision Workshops},
year = {2020},
pages = {27-32},
doi = {10.1007/978-3-030-66096-3_2},
url = {https://mlanthology.org/eccvw/2020/dai2020eccvw-commands/}
}