LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation

Abstract

We propose LangHOPS, the first Multimodal Large Language Model (MLLM)-based framework for open-vocabulary object–part instance segmentation. Given an image, LangHOPS can jointly detect and segment hierarchical object and part instances from open-vocabulary candidate categories. Unlike prior approaches that rely on heuristic or learnable visual grouping, our approach grounds object–part hierarchies in language space. It integrates the MLLM into the object-part parsing pipeline to leverage rich knowledge and reasoning capabilities, and link multi-granularity concepts within the hierarchies. We evaluate LangHOPS across multiple challenging scenarios, including in-domain and cross-dataset object-part instance segmentation, and zero-shot semantic segmentation. LangHOPS achieves state-of-the-art results, surpassing previous methods by 5.5% Average Precision(AP) (in-domain) and 4.8% (cross-dataset) on the PartImageNet dataset and by 2.5% mIOU on unseen object parts in ADE20K (zero-shot). Ablation studies further validate the effectiveness of the language-grounded hierarchy and MLLM-driven part query refinement strategy.

Cite

Text

Miao et al. "LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation." Advances in Neural Information Processing Systems, 2025.

Markdown

[Miao et al. "LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/miao2025neurips-langhops/)

BibTeX

@inproceedings{miao2025neurips-langhops,
  title     = {{LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation}},
  author    = {Miao, Yang and Zaech, Jan-Nico and Wang, Xi and Despinoy, Fabien and Paudel, Danda Pani and Van Gool, Luc},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/miao2025neurips-langhops/}
}