Hierarchical Semantic Segmentation with Autoregressive Language Modeling

Abstract

Hierarchical semantic segmentation entails progressively decomposing objects into smaller nested parts. Existing approaches either require multiple inference passes or multiple, fixed decoders. We instead introduce HALLUMI, an autoregressive language modeling framework that performs the task in one inference pass, relying on special tokens to indicate parent-child relationships so the hierarchy can be recovered from the generated text. Experiments on a hierarchical semantic segmentation dataset to the subpart-level (SPIN) show HALLUMI achieves state-of-the-art results.

Cite

Text

Myers-Dean et al. "Hierarchical Semantic Segmentation with Autoregressive Language Modeling." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Myers-Dean et al. "Hierarchical Semantic Segmentation with Autoregressive Language Modeling." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/myersdean2025cvprw-hierarchical/)

BibTeX

@inproceedings{myersdean2025cvprw-hierarchical,
  title     = {{Hierarchical Semantic Segmentation with Autoregressive Language Modeling}},
  author    = {Myers-Dean, Josh and Price, Brian L. and Fan, Yifei and Gurari, Danna},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {4120-4130},
  url       = {https://mlanthology.org/cvprw/2025/myersdean2025cvprw-hierarchical/}
}