Do Not Have Enough Data? Deep Learning to the Rescue!
Abstract
Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically those applicable to text classification tasks with little data.
Cite
Text
Anaby-Tavor et al. "Do Not Have Enough Data? Deep Learning to the Rescue!." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I05.6233Markdown
[Anaby-Tavor et al. "Do Not Have Enough Data? Deep Learning to the Rescue!." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/anabytavor2020aaai-enough/) doi:10.1609/AAAI.V34I05.6233BibTeX
@inproceedings{anabytavor2020aaai-enough,
title = {{Do Not Have Enough Data? Deep Learning to the Rescue!}},
author = {Anaby-Tavor, Ateret and Carmeli, Boaz and Goldbraich, Esther and Kantor, Amir and Kour, George and Shlomov, Segev and Tepper, Naama and Zwerdling, Naama},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2020},
pages = {7383-7390},
doi = {10.1609/AAAI.V34I05.6233},
url = {https://mlanthology.org/aaai/2020/anabytavor2020aaai-enough/}
}