Human Parsing with Contextualized Convolutional Neural Network

Abstract

In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network. Given an input human image, Co-CNN produces the pixel-wise categorization in an end-to-end way. First, the cross-layer context is captured by our basic local-to-global-to-local structure, which hierarchically combines the global semantic structure and the local fine details within the cross-layers. Second, the global image-level label prediction is used as an auxiliary objective in the intermediate layer of the Co-CNN, and its outputs are further used for guiding the feature learning in subsequent convolutional layers to leverage the global image-level context. Finally, to further utilize the local super-pixel contexts, the within-super-pixel smoothing and cross-super-pixel neighbourhood voting are formulated as natural sub-components of the Co-CNN to achieve the local label consistency in both training and testing process. Comprehensive evaluations on two public datasets well demonstrate the significant superiority of our Co-CNN architecture over other state-of-the-arts for human parsing. In particular, the F-1 score on the large dataset reaches 76.95% by Co-CNN, significantly higher than 62.81% and 64.38% by the state-of-the-art algorithms, M-CNN and ATR, respectively.

Cite

Text

Liang et al. "Human Parsing with Contextualized Convolutional Neural Network." International Conference on Computer Vision, 2015. doi:10.1109/ICCV.2015.163

Markdown

[Liang et al. "Human Parsing with Contextualized Convolutional Neural Network." International Conference on Computer Vision, 2015.](https://mlanthology.org/iccv/2015/liang2015iccv-human/) doi:10.1109/ICCV.2015.163

BibTeX

@inproceedings{liang2015iccv-human,
  title     = {{Human Parsing with Contextualized Convolutional Neural Network}},
  author    = {Liang, Xiaodan and Xu, Chunyan and Shen, Xiaohui and Yang, Jianchao and Liu, Si and Tang, Jinhui and Lin, Liang and Yan, Shuicheng},
  booktitle = {International Conference on Computer Vision},
  year      = {2015},
  doi       = {10.1109/ICCV.2015.163},
  url       = {https://mlanthology.org/iccv/2015/liang2015iccv-human/}
}