Best Practices for Fine-Tuning Visual Classifiers to New Domains
Abstract
Recent studies have shown that features from deep convolutional neural networks learned using large labeled datasets, like ImageNet, provide effective representations for a variety of visual recognition tasks. They achieve strong performance as generic features and are even more effective when fine-tuned to target datasets. However, details of the fine-tuning procedure across datasets and with different amount of labeled data are not well-studied and choosing the best fine-tuning method is often left to trial and error. In this work we systematically explore the design-space for fine-tuning and give recommendations based on two key characteristics of the target dataset: visual distance from source dataset and the amount of available training data. Through a comprehensive experimental analysis, we conclude, with a few exceptions, that it is best to copy as many layers of a pre-trained network as possible, and then adjust the level of fine-tuning based on the visual distance from source.
Cite
Text
Chu et al. "Best Practices for Fine-Tuning Visual Classifiers to New Domains." European Conference on Computer Vision, 2016. doi:10.1007/978-3-319-49409-8_34Markdown
[Chu et al. "Best Practices for Fine-Tuning Visual Classifiers to New Domains." European Conference on Computer Vision, 2016.](https://mlanthology.org/eccv/2016/chu2016eccv-best/) doi:10.1007/978-3-319-49409-8_34BibTeX
@inproceedings{chu2016eccv-best,
title = {{Best Practices for Fine-Tuning Visual Classifiers to New Domains}},
author = {Chu, Brian and Madhavan, Vashisht and Beijbom, Oscar and Hoffman, Judy and Darrell, Trevor},
booktitle = {European Conference on Computer Vision},
year = {2016},
pages = {435-442},
doi = {10.1007/978-3-319-49409-8_34},
url = {https://mlanthology.org/eccv/2016/chu2016eccv-best/}
}