Wang et al. "VisionCube: 3D-Aware Vision-Language Model for Multi-Step Spatial Reasoning." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.
Markdown
[Wang et al. "VisionCube: 3D-Aware Vision-Language Model for Multi-Step Spatial Reasoning." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/wang2025cvprw-visioncube/)
BibTeX
@inproceedings{wang2025cvprw-visioncube,
title = {{VisionCube: 3D-Aware Vision-Language Model for Multi-Step Spatial Reasoning}},
author = {Wang, Feiyang and Luo, Nan and Wu, Wangyu},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2025},
pages = {3270-3279},
url = {https://mlanthology.org/cvprw/2025/wang2025cvprw-visioncube/}
}