Vec2Node: Self-Training with Tensor Augmentation for Text Classification with Few Labels
Abstract
Recent advances in state-of-the-art machine learning models like deep neural networks heavily rely on large amounts of labeled training data which is difficult to obtain for many applications. To address label scarcity, recent work has focused on data augmentation techniques to create synthetic training data. In this work, we propose a novel approach of data augmentation leveraging tensor decomposition to generate synthetic samples by exploiting local and global information in text and reducing concept drift. We develop Vec2Node that leverages self-training from in-domain unlabeled data augmented with tensorized word embeddings that significantly improves over state-of-the-art models, particularly in low-resource settings. For instance, with only $1\%$ 1 % of labeled training data, Vec2Node improves the accuracy of a base model by $16.7 \%$ 16.7 % . Furthermore, Vec2Node generates explicable augmented data leveraging tensor embeddings.
Cite
Text
Abdali et al. "Vec2Node: Self-Training with Tensor Augmentation for Text Classification with Few Labels." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022. doi:10.1007/978-3-031-26390-3_33Markdown
[Abdali et al. "Vec2Node: Self-Training with Tensor Augmentation for Text Classification with Few Labels." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022.](https://mlanthology.org/ecmlpkdd/2022/abdali2022ecmlpkdd-vec2node/) doi:10.1007/978-3-031-26390-3_33BibTeX
@inproceedings{abdali2022ecmlpkdd-vec2node,
title = {{Vec2Node: Self-Training with Tensor Augmentation for Text Classification with Few Labels}},
author = {Abdali, Sara and Mukherjee, Subhabrata and Papalexakis, Evangelos E.},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2022},
pages = {571-587},
doi = {10.1007/978-3-031-26390-3_33},
url = {https://mlanthology.org/ecmlpkdd/2022/abdali2022ecmlpkdd-vec2node/}
}