Best Practices for Fine-Tuning Visual Classifiers to New Domains

Abstract

Recent studies have shown that features from deep convolutional neural networks learned using large labeled datasets, like ImageNet, provide effective representations for a variety of visual recognition tasks. They achieve strong performance as generic features and are even more effective when fine-tuned to target datasets. However, details of the fine-tuning procedure across datasets and with different amount of labeled data are not well-studied and choosing the best fine-tuning method is often left to trial and error. In this work we systematically explore the design-space for fine-tuning and give recommendations based on two key characteristics of the target dataset: visual distance from source dataset and the amount of available training data. Through a comprehensive experimental analysis, we conclude, with a few exceptions, that it is best to copy as many layers of a pre-trained network as possible, and then adjust the level of fine-tuning based on the visual distance from source.

Cite

Text

Chu et al. "Best Practices for Fine-Tuning Visual Classifiers to New Domains." European Conference on Computer Vision Workshops, 2016. doi:10.1007/978-3-319-49409-8_34

Markdown

[Chu et al. "Best Practices for Fine-Tuning Visual Classifiers to New Domains." European Conference on Computer Vision Workshops, 2016.](https://mlanthology.org/eccvw/2016/chu2016eccvw-best/) doi:10.1007/978-3-319-49409-8_34

BibTeX

@inproceedings{chu2016eccvw-best,
  title     = {{Best Practices for Fine-Tuning Visual Classifiers to New Domains}},
  author    = {Chu, Brian and Madhavan, Vashisht and Beijbom, Oscar and Hoffman, Judy and Darrell, Trevor},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2016},
  pages     = {435-442},
  doi       = {10.1007/978-3-319-49409-8_34},
  url       = {https://mlanthology.org/eccvw/2016/chu2016eccvw-best/}
}