Best of Both Worlds: Human-Machine Collaboration for Object Annotation

Abstract

The long-standing goal of localizing every object in an image remains elusive. Manually annotating objects is quite expensive despite crowd engineering innovations. Current state-of-the-art automatic object detectors can accurately detect at most a few objects per image. This paper brings together the latest advancements in object detection and in crowd engineering into a principled framework for accurately and efficiently localizing objects in images. The input to the system is an image to annotate and a set of annotation constraints: desired precision, utility and/or human cost of the labeling. The output is a set of object annotations, informed by human feedback and computer vision. Our model seamlessly integrates multiple computer vision models with multiple sources of human input in a Markov Decision Process. We empirically validate the effectiveness of our human-in-the-loop labeling approach on the ILSVRC2014 object detection dataset.

Cite

Text

Russakovsky et al. "Best of Both Worlds: Human-Machine Collaboration for Object Annotation." Conference on Computer Vision and Pattern Recognition, 2015. doi:10.1109/CVPR.2015.7298824

Markdown

[Russakovsky et al. "Best of Both Worlds: Human-Machine Collaboration for Object Annotation." Conference on Computer Vision and Pattern Recognition, 2015.](https://mlanthology.org/cvpr/2015/russakovsky2015cvpr-best/) doi:10.1109/CVPR.2015.7298824

BibTeX

@inproceedings{russakovsky2015cvpr-best,
  title     = {{Best of Both Worlds: Human-Machine Collaboration for Object Annotation}},
  author    = {Russakovsky, Olga and Li, Li-Jia and Fei-Fei, Li},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2015},
  doi       = {10.1109/CVPR.2015.7298824},
  url       = {https://mlanthology.org/cvpr/2015/russakovsky2015cvpr-best/}
}