PhytoSynth: Leveraging Multi-Modal Generative Model for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach

Abstract

Collecting large-scale crop disease images in the field is labor-intensive and time-consuming. Generative models (GMs) offer an alternative by creating synthetic samples that resemble real-world images. However, existing research primarily relies on Generative Adversarial Networks (GANs)-based image-to-image translation and lack a comprehensive analysis of computational requirements in agriculture. Therefore, this research explores a multi-modal text-to-image approach for generating synthetic crop disease images and is the first to provide computational benchmarking in this context. We trained three Stable Diffusion (SD) variants--SDXL, SD3.5M (medium), and SD3.5L (large)-and fine-tuned them using Dreambooth and Low-Rank Adaptation (LoRA) fine-tuning techniques to enhance generalization. SD3.5M outperformed the others, with an average memory usage of 18 GB, power consumption of 180 W, and total energy use of 1.02 kWh/500 images ( 0.002 kWh/image) during inference task. Our results demonstrate SD3.5M's ability to generate 500 synthetic images from just 36 in-field samples in 1.5 hours. We recommend SD3.5M for efficient crop disease data generation.

Cite

Text

Rai et al. "PhytoSynth: Leveraging Multi-Modal Generative Model for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Rai et al. "PhytoSynth: Leveraging Multi-Modal Generative Model for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/rai2025cvprw-phytosynth/)

BibTeX

@inproceedings{rai2025cvprw-phytosynth,
  title     = {{PhytoSynth: Leveraging Multi-Modal Generative Model for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach}},
  author    = {Rai, Nitin and Schumann, Arnold W. and Boyd, Nathan},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {5371-5380},
  url       = {https://mlanthology.org/cvprw/2025/rai2025cvprw-phytosynth/}
}