ODAM: Object Detection, Association, and Mapping Using Posed RGB Video

Abstract

Localizing objects and estimating their extent in 3D is an important step towards high-level 3D scene understanding, which has many applications in Augmented Reality and Robotics. We present ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos. The proposed system relies on a deep-learning-based front-end to detect 3D objects from a given RGB frame and associate them to a global object-based map using a graph neural network (GNN). Based on these frame-to-model associations, our back-end optimizes object bounding volumes, represented as super-quadrics, under multi-view geometry constraints and the object scale prior. We validate the proposed system on ScanNet where we show a significant improvement over existing RGB-only methods.

Cite

Text

Li et al. "ODAM: Object Detection, Association, and Mapping Using Posed RGB Video." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00594

Markdown

[Li et al. "ODAM: Object Detection, Association, and Mapping Using Posed RGB Video." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/li2021iccv-odam/) doi:10.1109/ICCV48922.2021.00594

BibTeX

@inproceedings{li2021iccv-odam,
  title     = {{ODAM: Object Detection, Association, and Mapping Using Posed RGB Video}},
  author    = {Li, Kejie and DeTone, Daniel and Chen, Yu Fan and Vo, Minh and Reid, Ian and Rezatofighi, Hamid and Sweeney, Chris and Straub, Julian and Newcombe, Richard},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {5998-6008},
  doi       = {10.1109/ICCV48922.2021.00594},
  url       = {https://mlanthology.org/iccv/2021/li2021iccv-odam/}
}