3D Segmenter: 3D Transformer Based Semantic Segmentation via 2D Panoramic Distillation

Wu, Zhennan; Li, Yang; Huang, Yifei; Gu, Lin; Harada, Tatsuya; Sato, Hiroyuki

3D Segmenter: 3D Transformer Based Semantic Segmentation via 2D Panoramic Distillation

Zhennan Wu, Yang Li, Yifei Huang, Lin Gu, Tatsuya Harada, Hiroyuki Sato

ICLR 2023

/iclr/2023/wu2023iclr-3d/

Abstract

Recently, 2D semantic segmentation has witnessed a significant advancement thanks to the huge amount of 2D image datasets available. Therefore, in this work, we propose the first 2D-to-3D knowledge distillation strategy to enhance 3D semantic segmentation model with knowledge embedded in the latent space of powerful 2D models. Specifically, unlike standard knowledge distillation, where teacher and student models take the same data as input, we use 2D panoramas properly aligned with corresponding 3D rooms to train the teacher network and use the learned knowledge from 2D teacher to guide 3D student. To facilitate our research, we create a large-scale, fine-annotated 3D semantic segmentation benchmark, containing voxel-wise semantic labels and aligned panoramas of 5175 scenes. Based on this benchmark, we propose a 3D volumetric semantic segmentation network, which adapts Video Swin Transformer as backbone and introduces a skip connected linear decoder. Achieving a state-of-the-art performance, our 3D Segmenter is computationally efficient and only requires $3.8\%$ of the parameters compared to the prior art. Our code and data will be released upon acceptance.

PDF ICLR Semantic Scholar

Cite

Text

Wu et al. "3D Segmenter: 3D Transformer Based Semantic Segmentation via 2D Panoramic Distillation." International Conference on Learning Representations, 2023.

Markdown

[Wu et al. "3D Segmenter: 3D Transformer Based Semantic Segmentation via 2D Panoramic Distillation." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/wu2023iclr-3d/)

BibTeX

@inproceedings{wu2023iclr-3d,
  title     = {{3D Segmenter: 3D Transformer Based Semantic Segmentation via 2D Panoramic Distillation}},
  author    = {Wu, Zhennan and Li, Yang and Huang, Yifei and Gu, Lin and Harada, Tatsuya and Sato, Hiroyuki},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/wu2023iclr-3d/}
}