End-to-End Learning Improves Static Object Geo-Localization from Video

Abstract

Accurately estimating the position of static objects, such as traffic lights, from the moving camera of a self-driving car is a challenging problem. In this work, we present a system that improves the localization of static objects by jointly-optimizing the components of the system via learning. Our system is comprised of networks that perform: 1) 5DoF object pose estimation from a single image, 2) association of objects between pairs of frames, and 3) multi-object tracking to produce the final geo-localization of the static objects within the scene. We evaluate our approach using a publicly-available data set, focusing on traffic lights due to data availability. For each component, we compare against contemporary alternatives and show significantly-improved performance. We also show that the end-to-end system performance is further improved via joint-training of the constituent models.

Cite

Text

Chaabane et al. "End-to-End Learning Improves Static Object Geo-Localization from Video." Winter Conference on Applications of Computer Vision, 2021.

Markdown

[Chaabane et al. "End-to-End Learning Improves Static Object Geo-Localization from Video." Winter Conference on Applications of Computer Vision, 2021.](https://mlanthology.org/wacv/2021/chaabane2021wacv-endtoend/)

BibTeX

@inproceedings{chaabane2021wacv-endtoend,
  title     = {{End-to-End Learning Improves Static Object Geo-Localization from Video}},
  author    = {Chaabane, Mohamed and Gueguen, Lionel and Trabelsi, Ameni and Beveridge, Ross and O'Hara, Stephen},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2021},
  pages     = {2063-2072},
  url       = {https://mlanthology.org/wacv/2021/chaabane2021wacv-endtoend/}
}