Commands for Autonomous Vehicles by Progressively Stacking Visual-Linguistic Representations

Abstract

In this work, we focus on the object referral problem in the autonomous driving setting. We use a stacked visual-linguistic BERT model to learn a generic visual-linguistic representation. Each element of the input is either a word or a region of interest from the input image. To train the deep model efficiently, we use a stacking algorithm to transfer knowledge from a shallow BERT model to a deep BERT model.

Cite

Text

Dai et al. "Commands for Autonomous Vehicles by Progressively Stacking Visual-Linguistic Representations." European Conference on Computer Vision Workshops, 2020. doi:10.1007/978-3-030-66096-3_2

Markdown

[Dai et al. "Commands for Autonomous Vehicles by Progressively Stacking Visual-Linguistic Representations." European Conference on Computer Vision Workshops, 2020.](https://mlanthology.org/eccvw/2020/dai2020eccvw-commands/) doi:10.1007/978-3-030-66096-3_2

BibTeX

@inproceedings{dai2020eccvw-commands,
  title     = {{Commands for Autonomous Vehicles by Progressively Stacking Visual-Linguistic Representations}},
  author    = {Dai, Hang and Luo, Shujie and Ding, Yong and Shao, Ling},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2020},
  pages     = {27-32},
  doi       = {10.1007/978-3-030-66096-3_2},
  url       = {https://mlanthology.org/eccvw/2020/dai2020eccvw-commands/}
}