Human-Centric Visual Relation Segmentation Using Mask R-CNN and VTransE

Abstract

In this paper, we propose a novel human-centric visual relation segmentation method based on Mask R-CNN model and VTransE model. We first retain the Mask R-CNN model, and segment both human and object instances. Because Mask R-CNN may omit some human instances in instance segmentation, we further detect the omitted faces and extend them to localize the corresponding human instances. Finally, we retrain the last layer of VTransE model, and detect the visual relations between each pair of human instance and human/object instance. The experimental results show that our method obtains 0.4799, 0.4069, and 0.2681 on the criteria of R@100 with the m-IoU of 0.25, 0.50 and 0.75, respectively, which outperforms other methods in Person in Context Challenge.

Cite

Text

Yu et al. "Human-Centric Visual Relation Segmentation Using Mask R-CNN and VTransE." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11012-3_44

Markdown

[Yu et al. "Human-Centric Visual Relation Segmentation Using Mask R-CNN and VTransE." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/yu2018eccvw-humancentric/) doi:10.1007/978-3-030-11012-3_44

BibTeX

@inproceedings{yu2018eccvw-humancentric,
  title     = {{Human-Centric Visual Relation Segmentation Using Mask R-CNN and VTransE}},
  author    = {Yu, Fan and Tan, Xin and Ren, Tongwei and Wu, Gangshan},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2018},
  pages     = {582-589},
  doi       = {10.1007/978-3-030-11012-3_44},
  url       = {https://mlanthology.org/eccvw/2018/yu2018eccvw-humancentric/}
}