The Truth About Cats and Dogs

Abstract

Template-based object detectors such as the deformable parts model of Felzenszwalb et al. [11] achieve state-of-the-art performance for a variety of object categories, but are still outperformed by simpler bag-of-words models for highly flexible objects such as cats and dogs. In these cases we propose to use the template-based model to detect a distinctive part for the class, followed by detecting the rest of the object via segmentation on image specific information learnt from that part. This approach is motivated by two observations: (i) many object classes contain distinctive parts that can be detected very reliably by template-based detectors, whilst the entire object cannot; (ii) many classes (e.g. animals) have fairly homogeneous coloring and texture that can be used to segment the object once a sample is provided in an image. We show quantitatively that our method substantially outperforms whole-body template-based detectors for these highly deformable object categories, and indeed achieves accuracy comparable to the state-of-the-art on the PASCAL VOC competition, which includes other models such as bag-of-words.

Cite

Text

Parkhi et al. "The Truth About Cats and Dogs." IEEE/CVF International Conference on Computer Vision, 2011. doi:10.1109/ICCV.2011.6126398

Markdown

[Parkhi et al. "The Truth About Cats and Dogs." IEEE/CVF International Conference on Computer Vision, 2011.](https://mlanthology.org/iccv/2011/parkhi2011iccv-truth/) doi:10.1109/ICCV.2011.6126398

BibTeX

@inproceedings{parkhi2011iccv-truth,
  title     = {{The Truth About Cats and Dogs}},
  author    = {Parkhi, Omkar M. and Vedaldi, Andrea and Jawahar, C. V. and Zisserman, Andrew},
  booktitle = {IEEE/CVF International Conference on Computer Vision},
  year      = {2011},
  pages     = {1427-1434},
  doi       = {10.1109/ICCV.2011.6126398},
  url       = {https://mlanthology.org/iccv/2011/parkhi2011iccv-truth/}
}