HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering

Abstract

Traditional 3D scene understanding techniques are generally predicated on hand-annotated label sets, but in recent years a new class of open-vocabulary 3D scene understanding techniques has emerged. Despite the success of this paradigm on small scenes, existing approaches cannot scale efficiently to city-scale 3D datasets. In this paper, we present Hiearchical vocab-Agnostic Expert Clustering (HAEC), after the latin word for `this', a superpoint graph clustering based approach which utilizes a novel mixture of experts graph transformer for its backbone. We administer this highly scalable approach to the first application of open-vocabulary scene understanding on the SensatUrban city-scale dataset. We also demonstrate a synthetic labeling pipeline which is derived entirely from the raw point clouds with no hand-annotation. Our technique can help unlock complex operations on dense urban 3D scenes and open a new path forward in the processing of digital twins.

Cite

Text

Rusnak and Kaplan. "HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Rusnak and Kaplan. "HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/rusnak2025cvprw-haeccity/)

BibTeX

@inproceedings{rusnak2025cvprw-haeccity,
  title     = {{HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering}},
  author    = {Rusnak, Alexander and Kaplan, Frédéric},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {5256-5265},
  url       = {https://mlanthology.org/cvprw/2025/rusnak2025cvprw-haeccity/}
}