Count, Crop and Recognise: Fine-Grained Recognition in the Wild

Abstract

The goal of this paper is to label all the animal individuals present in every frame of a video. Unlike previous methods that have principally concentrated on labelling face tracks, we aim to label individuals even when their faces are not visible. We make the following contributions: (i) we introduce a 'Count, Crop and Recognise' (CCR) multi-stage recognition process for frame level labelling. The Count and Recognise stages involve specialised CNNs for the task, and we show that this simple staging gives a substantial boost in performance; (ii) we compare the recall using frame based labelling to both face and body track based labelling, and demonstrate the advantage of frame based with CCR for the specified goal; (iii) we introduce a new dataset for chimpanzee recognition in the wild; and (iv) we apply a high-granularity visualisation technique to further understand the learned CNN features for the recognition of chimpanzee individuals.

Cite

Text

Bain et al. "Count, Crop and Recognise: Fine-Grained Recognition in the Wild." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00032

Markdown

[Bain et al. "Count, Crop and Recognise: Fine-Grained Recognition in the Wild." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/bain2019iccvw-count/) doi:10.1109/ICCVW.2019.00032

BibTeX

@inproceedings{bain2019iccvw-count,
  title     = {{Count, Crop and Recognise: Fine-Grained Recognition in the Wild}},
  author    = {Bain, Max and Nagrani, Arsha and Schofield, Daniel and Zisserman, Andrew},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {236-246},
  doi       = {10.1109/ICCVW.2019.00032},
  url       = {https://mlanthology.org/iccvw/2019/bain2019iccvw-count/}
}