LLM-AutoDA: Large Language Model-Driven Automatic Data Augmentation for Long-Tailed Problems

Pengkun Wang, Zhe Zhao, Haibin Wen, Fanfu Wang, Binwu Wang, Qingfu Zhang, Yang Wang

NeurIPS 2024

doi:10.52202/079017-2072 /neurips/2024/wang2024neurips-llmautoda/

Abstract

The long-tailed distribution is the underlying nature of real-world data, and it presents unprecedented challenges for training deep learning models. Existing long-tailed learning paradigms based on re-balancing or data augmentation have partially alleviated the long-tailed problem. However, they still have limitations, such as relying on manually designed augmentation strategies, having a limited search space, and using fixed augmentation strategies. To address these limitations, this paper proposes a novel LLM-based long-tailed data augmentation framework called LLM-AutoDA, which leverages large-scale pretrained models to automatically search for the optimal augmentation strategies suitable for long-tailed data distributions. In addition, it applies this strategy to the original imbalanced data to create an augmented dataset and fine-tune the underlying long-tailed learning model. The performance improvement on the validation set serves as a reward signal to update the generation model, enabling the generation of more effective augmentation strategies in the next iteration. We conducted extensive experiments on multiple mainstream long-tailed learning benchmarks. The results show that LLM-AutoDA outperforms state-of-the-art data augmentation methods and other re-balancing methods significantly.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Wang et al. "LLM-AutoDA: Large Language Model-Driven Automatic Data Augmentation for Long-Tailed Problems." Neural Information Processing Systems, 2024. doi:10.52202/079017-2072

Markdown

[Wang et al. "LLM-AutoDA: Large Language Model-Driven Automatic Data Augmentation for Long-Tailed Problems." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/wang2024neurips-llmautoda/) doi:10.52202/079017-2072

BibTeX

@inproceedings{wang2024neurips-llmautoda,
  title     = {{LLM-AutoDA: Large Language Model-Driven Automatic Data Augmentation for Long-Tailed Problems}},
  author    = {Wang, Pengkun and Zhao, Zhe and Wen, Haibin and Wang, Fanfu and Wang, Binwu and Zhang, Qingfu and Wang, Yang},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2072},
  url       = {https://mlanthology.org/neurips/2024/wang2024neurips-llmautoda/}
}