DeTra: A Unified Model for Object Detection and Trajectory Forecasting

Abstract

The tasks of object detection and trajectory forecasting play a crucial role in understanding the scene for autonomous driving. These tasks are typically executed in a cascading manner, making them prone to compounding errors. Furthermore, there is usually a very thin interface between the two tasks, creating a lossy information bottleneck. To address these challenges, our approach formulates the union of the two tasks as a trajectory refinement problem, where the first pose is the detection (current time), and the subsequent poses are the waypoints of the multiple forecasts (future time). To tackle this unified task, we design a refinement transformer that infers the presence, pose, and multi-modal future behaviors of objects directly from LiDAR point clouds and high-definition maps. We call this model , short for object Detection and Trajectory forecasting. In our experiments, we observe that outperforms the state-of-the-art on Argoverse 2 Sensor and Waymo Open Dataset by a large margin, across a broad range of metrics. Finally, we perform extensive ablation studies that show the value of refinement for this task and that key design choices were made.

Cite

Text

Casas et al. "DeTra: A Unified Model for Object Detection and Trajectory Forecasting." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73223-2_19

Markdown

[Casas et al. "DeTra: A Unified Model for Object Detection and Trajectory Forecasting." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/casas2024eccv-detra/) doi:10.1007/978-3-031-73223-2_19

BibTeX

@inproceedings{casas2024eccv-detra,
  title     = {{DeTra: A Unified Model for Object Detection and Trajectory Forecasting}},
  author    = {Casas, Sergio and Agro, Ben T and Mao, Jiageng and Gilles, Thomas and Cui, Alexander Y and Li, Enxu and Urtasun, Raquel},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73223-2_19},
  url       = {https://mlanthology.org/eccv/2024/casas2024eccv-detra/}
}