ODAM: Object Detection, Association, and Mapping Using Posed RGB Video
Abstract
Localizing objects and estimating their extent in 3D is an important step towards high-level 3D scene understanding, which has many applications in Augmented Reality and Robotics. We present ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos. The proposed system relies on a deep-learning-based front-end to detect 3D objects from a given RGB frame and associate them to a global object-based map using a graph neural network (GNN). Based on these frame-to-model associations, our back-end optimizes object bounding volumes, represented as super-quadrics, under multi-view geometry constraints and the object scale prior. We validate the proposed system on ScanNet where we show a significant improvement over existing RGB-only methods.
Cite
Text
Li et al. "ODAM: Object Detection, Association, and Mapping Using Posed RGB Video." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00594Markdown
[Li et al. "ODAM: Object Detection, Association, and Mapping Using Posed RGB Video." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/li2021iccv-odam/) doi:10.1109/ICCV48922.2021.00594BibTeX
@inproceedings{li2021iccv-odam,
title = {{ODAM: Object Detection, Association, and Mapping Using Posed RGB Video}},
author = {Li, Kejie and DeTone, Daniel and Chen, Yu Fan and Vo, Minh and Reid, Ian and Rezatofighi, Hamid and Sweeney, Chris and Straub, Julian and Newcombe, Richard},
booktitle = {International Conference on Computer Vision},
year = {2021},
pages = {5998-6008},
doi = {10.1109/ICCV48922.2021.00594},
url = {https://mlanthology.org/iccv/2021/li2021iccv-odam/}
}