GRIB: Combining Global Reception and Inductive Bias for Human Segmentation and Matting

Abstract

Human video segmentation and matting are challenging computer vision tasks, with many applications such as background replacement or background editing. Numerous methods have been proposed for human segmentation and matting in either portrait or first-person view videos. In this paper, we propose a real-time network that performs first-person view hand and manipulated object segmentation as well as second-person view human video matting. We introduce a global reception inductive bias block in the network’s encoder that aggregates the pixel features at short, medium, and long ranges. Furthermore, we propose a multi-target optimization method that fully leverages segmentation and matting labels to accelerate training. Our model outperforms existing real-time methods by achieving 93.9% mIoU on HP-Portrait, 95.1% mIoU on VideoMatte as well as 72.7% mIoU on EgoHOS datasets and achieves faster runtime.

Cite

Text

Shen et al. "GRIB: Combining Global Reception and Inductive Bias for Human Segmentation and Matting." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00567

Markdown

[Shen et al. "GRIB: Combining Global Reception and Inductive Bias for Human Segmentation and Matting." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/shen2024cvprw-grib/) doi:10.1109/CVPRW63382.2024.00567

BibTeX

@inproceedings{shen2024cvprw-grib,
  title     = {{GRIB: Combining Global Reception and Inductive Bias for Human Segmentation and Matting}},
  author    = {Shen, Yezhi and Xu, Weichen and Lin, Qian and Allebach, Jan P. and Zhu, Fengqing},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {5576-5585},
  doi       = {10.1109/CVPRW63382.2024.00567},
  url       = {https://mlanthology.org/cvprw/2024/shen2024cvprw-grib/}
}