Sparse Refinement for Efficient High-Resolution Semantic Segmentation
Abstract
Semantic segmentation empowers numerous real-world applications, such as autonomous driving and augmented/mixed reality. These applications often operate on high-resolution images (, 8 megapixels) to capture the fine details. However, this comes at the cost of considerable computational complexity, hindering the deployment in latency-sensitive scenarios. In this paper, we introduce SparseRefine, a novel approach that enhances dense low-resolution predictions with sparse high-resolution refinements. Based on coarse low-resolution outputs, SparseRefine first uses an entropy selector to identify a sparse set of pixels with high entropy. It then employs a sparse feature extractor to efficiently generate the refinements for those pixels of interest. Finally, it leverages a gated ensembler to apply these sparse refinements to the initial coarse predictions. SparseRefine can be seamlessly integrated into any existing semantic segmentation model, regardless of CNN- or ViT-based. SparseRefine achieves significant speedup: 1.5 to 3.7 times when applied to HRNet-W48, SegFormer-B5, Mask2Former-T/L and SegNeXt-L on Cityscapes, with negligible to no loss of accuracy. Our “dense+sparse” paradigm paves the way for efficient high-resolution visual computing.
Cite
Text
Liu et al. "Sparse Refinement for Efficient High-Resolution Semantic Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72855-6_7Markdown
[Liu et al. "Sparse Refinement for Efficient High-Resolution Semantic Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/liu2024eccv-sparse/) doi:10.1007/978-3-031-72855-6_7BibTeX
@inproceedings{liu2024eccv-sparse,
title = {{Sparse Refinement for Efficient High-Resolution Semantic Segmentation}},
author = {Liu, Zhijian and Zhang, Zhuoyang and Khaki, Samir and Yang, Shang and Tang, Haotian and Xu, Chenfeng and Keutzer, Kurt and Han, Song},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-72855-6_7},
url = {https://mlanthology.org/eccv/2024/liu2024eccv-sparse/}
}