Human Attribute Recognition by Deep Hierarchical Contexts

Abstract

We present an approach for recognizing human attributes in unconstrained settings. We train a Convolutional Neural Network (CNN) to select the most attribute-descriptive human parts from all poselet detections, and combine them with the whole body as a pose-normalized deep representation. We further improve by using deep hierarchical contexts ranging from human-centric level to scene level. Human-centric context captures human relations, which we compute from the nearest neighbor parts of other people on a pyramid of CNN feature maps. The matched parts are then average pooled and they act as a similarity regularization. To utilize the scene context, we re-score human-centric predictions by the global scene classification score jointly learned in our CNN, yielding final scene-aware predictions. To facilitate our study, a large-scale WIDER Attribute dataset(Dataset URL: http://mmlab.ie.cuhk.edu.hk/projects/WIDERAttribute ) is introduced with human attribute and image event annotations, and our method surpasses competitive baselines on this dataset and other popular ones.

Cite

Text

Li et al. "Human Attribute Recognition by Deep Hierarchical Contexts." European Conference on Computer Vision, 2016. doi:10.1007/978-3-319-46466-4_41

Markdown

[Li et al. "Human Attribute Recognition by Deep Hierarchical Contexts." European Conference on Computer Vision, 2016.](https://mlanthology.org/eccv/2016/li2016eccv-human/) doi:10.1007/978-3-319-46466-4_41

BibTeX

@inproceedings{li2016eccv-human,
  title     = {{Human Attribute Recognition by Deep Hierarchical Contexts}},
  author    = {Li, Yining and Huang, Chen and Loy, Chen Change and Tang, Xiaoou},
  booktitle = {European Conference on Computer Vision},
  year      = {2016},
  pages     = {684-700},
  doi       = {10.1007/978-3-319-46466-4_41},
  url       = {https://mlanthology.org/eccv/2016/li2016eccv-human/}
}