Fine-Grained Classification of Pedestrians in Video: Benchmark and State of the Art

Abstract

A video dataset that is designed to study fine-grained categorisation of pedestrians is introduced. Pedestrians were recorded ``in-the-wild'' from a moving vehicle. Annotations include bounding boxes, tracks, 14 keypoints with occlusion information and the fine-grained categories of age (5 classes), sex (2 classes), weight (3 classes) and clothing style (4 classes). There are a total of 27,454 bounding box and pose labels across 4222 tracks. This dataset is designed to train and test algorithms for fine-grained categorisation of people; it is also useful for benchmarking tracking, detection and pose estimation of pedestrians. State-of-the-art algorithms for fine-grained classification and pose estimation were tested using the dataset and the results are reported as a useful performance baseline.

Cite

Text

Hall and Perona. "Fine-Grained Classification of Pedestrians in Video: Benchmark and State of the Art." Conference on Computer Vision and Pattern Recognition, 2015. doi:10.1109/CVPR.2015.7299187

Markdown

[Hall and Perona. "Fine-Grained Classification of Pedestrians in Video: Benchmark and State of the Art." Conference on Computer Vision and Pattern Recognition, 2015.](https://mlanthology.org/cvpr/2015/hall2015cvpr-finegrained/) doi:10.1109/CVPR.2015.7299187

BibTeX

@inproceedings{hall2015cvpr-finegrained,
  title     = {{Fine-Grained Classification of Pedestrians in Video: Benchmark and State of the Art}},
  author    = {Hall, David and Perona, Pietro},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2015},
  doi       = {10.1109/CVPR.2015.7299187},
  url       = {https://mlanthology.org/cvpr/2015/hall2015cvpr-finegrained/}
}