Assisted Few-Shot Learning for Vision-Language Models in Agricultural Stress Phenotype Identification
Abstract
In the agricultural sector, labeled data for crop diseases and stresses are often scarce due to high annotation costs. We propose an Assisted Few-Shot Learning approach to enhance vision-language models (VLMs) for image classification tasks with limited annotated data by optimizing the selection of input examples. Our method employs one image encoder at a time—Vision Transformer (ViT), ResNet-50, or CLIP—to retrieve contextually similar examples using cosine similarity of embeddings, thereby providing relevant few-shot prompts to VLMs. We evaluate our approach on the agricultural benchmark for VLMs, focusing on stress phenotyping, where proposed method improves performance in 6 out of 7 tasks. Experimental results demonstrate that, using the ViT encoder, the average F1 score across seven agricultural classification tasks increased from 68.68\% to 80.45\%, highlighting the effectiveness of our method in improving model performance with limited data.
Cite
Text
Arshad et al. "Assisted Few-Shot Learning for Vision-Language Models in Agricultural Stress Phenotype Identification." NeurIPS 2024 Workshops: AFM, 2024.Markdown
[Arshad et al. "Assisted Few-Shot Learning for Vision-Language Models in Agricultural Stress Phenotype Identification." NeurIPS 2024 Workshops: AFM, 2024.](https://mlanthology.org/neuripsw/2024/arshad2024neuripsw-assisted/)BibTeX
@inproceedings{arshad2024neuripsw-assisted,
title = {{Assisted Few-Shot Learning for Vision-Language Models in Agricultural Stress Phenotype Identification}},
author = {Arshad, Muhammad Arbab and Jubery, Talukder Zaki and Singh, Asheesh K and Singh, Arti and Hegde, Chinmay and Ganapathysubramanian, Baskar and Balu, Aditya and Krishnamurthy, Adarsh and Sarkar, Soumik},
booktitle = {NeurIPS 2024 Workshops: AFM},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/arshad2024neuripsw-assisted/}
}