VisionCube: 3D-Aware Vision-Language Model for Multi-Step Spatial Reasoning

Cite

Text

Wang et al. "VisionCube: 3D-Aware Vision-Language Model for Multi-Step Spatial Reasoning." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Wang et al. "VisionCube: 3D-Aware Vision-Language Model for Multi-Step Spatial Reasoning." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/wang2025cvprw-visioncube/)

BibTeX

@inproceedings{wang2025cvprw-visioncube,
  title     = {{VisionCube: 3D-Aware Vision-Language Model for Multi-Step Spatial Reasoning}},
  author    = {Wang, Feiyang and Luo, Nan and Wu, Wangyu},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {3270-3279},
  url       = {https://mlanthology.org/cvprw/2025/wang2025cvprw-visioncube/}
}