Learning by Tracking: Siamese CNN for Robust Target Association

Abstract

This paper introduces a novel approach to the task of data association within the context of pedestrian tracking, by introducing a two-stage learning scheme to match pairs of detections. First, a Siamese convolutional neural network (CNN) is trained to learn descriptors encoding local spatio-temporal structures between the two input image patches, aggregating pixel values and optical flow information. Second, a set of contextual features derived from the position and size of the compared input patches are combined with the CNN output by means of a gradient boosting classifier to generate the final matching probability. This learning approach is validated by using a linear programming based multi-person tracker showing that even a simple and efficient tracker may outperform much more complex models when fed with our learned matching probabilities. Results on publicly available sequences show that our method meets state-of-the-art standards in multiple people tracking.

Cite

Text

Leal-Taixé et al. "Learning by Tracking: Siamese CNN for Robust Target Association." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2016. doi:10.1109/CVPRW.2016.59

Markdown

[Leal-Taixé et al. "Learning by Tracking: Siamese CNN for Robust Target Association." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2016.](https://mlanthology.org/cvprw/2016/lealtaixe2016cvprw-learning/) doi:10.1109/CVPRW.2016.59

BibTeX

@inproceedings{lealtaixe2016cvprw-learning,
  title     = {{Learning by Tracking: Siamese CNN for Robust Target Association}},
  author    = {Leal-Taixé, Laura and Canton-Ferrer, Cristian and Schindler, Konrad},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2016},
  pages     = {418-425},
  doi       = {10.1109/CVPRW.2016.59},
  url       = {https://mlanthology.org/cvprw/2016/lealtaixe2016cvprw-learning/}
}