Progressive Gaussian Transformer with Anisotropy-Aware Sampling for Open Vocabulary Occupancy Prediction
Abstract
The 3D occupancy prediction task has witnessed remarkable progress in recent years, playing a crucial role in vision-based autonomous driving systems. While traditional methods are limited to fixed semantic categories, recent approaches have moved towards predicting text-aligned features to enable open-vocabulary text queries in real-world scenes. However, there exists a trade-off in text-aligned scene modeling: sparse Gaussian representation struggles to capture small objects in the scene, while dense representation incurs significant computational overhead. To address these limitations, we present **PG-Occ**, an innovative **P**rogressive **G**aussian Transformer Framework that enables open-vocabulary 3D occupancy prediction. Our framework employs progressive online densification, a feed-forward strategy that gradually enhances the 3D Gaussian representation to capture fine-grained scene details. By iteratively enhancing the representation, the framework achieves increasingly precise and detailed scene understanding. Another key contribution is the introduction of an anisotropy-aware sampling strategy with spatio-temporal fusion, which adaptively assigns receptive fields to Gaussians at different scales and stages, enabling more effective feature aggregation and richer scene information capture. Through extensive evaluations, we demonstrate that **PG-Occ** achieves *state-of-the-art* performance with a relative **14.3\% mIoU improvement** over the previous best performing method. Code and pretrained models are available at: https://yanchi-3dv.github.io/PG-Occ.
Cite
Text
Yan and Xu. "Progressive Gaussian Transformer with Anisotropy-Aware Sampling for Open Vocabulary Occupancy Prediction." International Conference on Learning Representations, 2026.Markdown
[Yan and Xu. "Progressive Gaussian Transformer with Anisotropy-Aware Sampling for Open Vocabulary Occupancy Prediction." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/yan2026iclr-progressive/)BibTeX
@inproceedings{yan2026iclr-progressive,
title = {{Progressive Gaussian Transformer with Anisotropy-Aware Sampling for Open Vocabulary Occupancy Prediction}},
author = {Yan, Chi and Xu, Dan},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/yan2026iclr-progressive/}
}