Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene

Abstract

Self-driving cars relying solely on ego-centric perception face limitations in sensing, often failing to detect occluded, faraway objects. Collaborative autonomous driving (CAV) seems like a promising direction, but collecting data for development is non-trivial. It requires placing multiple sensor-equipped agents in a real-world driving scene, simultaneously! As such, existing datasets are limited in locations and agents. We introduce a novel surrogate to the rescue, which is to generate realistic perception from different viewpoints in a driving scene, conditioned on a real-world sample -- the ego-car's sensory data. This surrogate has huge potential: it could potentially turn any ego-car dataset into a collaborative driving one to scale up the development of CAV. We present the very first solution, using a combination of synthetic collaborative data and real ego-car data. Our method, Transfer Your Perspective (TYP), learns a conditioned diffusion model whose output samples are not only realistic but also consistent in both semantics and layouts with the given ego-car data. Empirical results demonstrate TYP's effectiveness in aiding in a CAV setting. In particular, TYP enables us to (pre-)train collaborative perception algorithms like early and late fusion with little or no real-world collaborative data, greatly facilitating downstream CAV applications.

Cite

Text

Pan et al. "Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01123

Markdown

[Pan et al. "Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/pan2025cvpr-transfer/) doi:10.1109/CVPR52734.2025.01123

BibTeX

@inproceedings{pan2025cvpr-transfer,
  title     = {{Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene}},
  author    = {Pan, Tai-Yu and Jeon, Sooyoung and Fan, Mengdi and Yoo, Jinsu and Feng, Zhenyang and Campbell, Mark and Weinberger, Kilian Q. and Hariharan, Bharath and Chao, Wei-Lun},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {12027-12036},
  doi       = {10.1109/CVPR52734.2025.01123},
  url       = {https://mlanthology.org/cvpr/2025/pan2025cvpr-transfer/}
}