Adapting Vision Foundation Models for Plant Phenotyping

Abstract

Foundation models are large models pre-trained on tremendous amount of data. They can be typically adapted to diverse downstream tasks with minimal effort. However, as foundation models are usually pre-trained on images or texts sourced from the Internet, their performance in specialized domains, such as plant phenotyping, comes into question. In addition, fully fine-tuning foundation models is time-consuming and requires high computational power. This paper investigates the efficient adaptation of foundation models for plant phenotyping settings and tasks. We perform extensive experiments on fine-tuning three foundation models, MAE, DINO, and DINOv2 on three essential plant phenotyping tasks: leaf counting, instance segmentation, and disease classification. In particular, the pretrained backbones are kept frozen, while two distinct fine-tuning methods are evaluated, namely adapter tuning (using LoRA) and decoder tuning. The experimental results show that a foundation model can be efficiently adapted to multiple plant phenotyping tasks, yielding similar performance as the state-of-the-art (SoTA) models specifically designed or trained for each task. Despite exhibiting great transferability over different tasks, the fine-tuned foundation models perform slightly worse than the SoTA task-specific models in some scenarios, which requires further investigation.

Cite

Text

Chen et al. "Adapting Vision Foundation Models for Plant Phenotyping." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00067

Markdown

[Chen et al. "Adapting Vision Foundation Models for Plant Phenotyping." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/chen2023iccvw-adapting/) doi:10.1109/ICCVW60793.2023.00067

BibTeX

@inproceedings{chen2023iccvw-adapting,
  title     = {{Adapting Vision Foundation Models for Plant Phenotyping}},
  author    = {Chen, Feng and Giuffrida, Mario Valerio and Tsaftaris, Sotirios A.},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2023},
  pages     = {604-613},
  doi       = {10.1109/ICCVW60793.2023.00067},
  url       = {https://mlanthology.org/iccvw/2023/chen2023iccvw-adapting/}
}