Input Compression with Positional Consistency for Efficient Training and Inference of Transformer Neural Networks

Abstract

Transformers have achieved state-of-the-art performance in processing text, images, audio and video. However, they present large computational requirements for both training and inference, and are prone to overfitting on small data sets. To address these challenges, we present Input Compression with Positional Consistency ( ICPC ), a new data augmentation method that simultaneously improves both generalization and training efficiency. The key insight behind ICPC is that input compression can be used as a data augmentation technique. ICPC applies varying levels of compression to each sample in each epoch. This leads to smaller input sequences being processed by the Transformer, and hence faster training, while also alleviating overfitting by presenting each input with different compression levels. We introduce a consistency-aware position selection method in ICPC that enables accurate processing of compressed inputs without any changes to the underlying Transformer architecture. We detail compression-based augmentation methods for four different modalities – insignificant word pruning for text, resolution modulation for images, spatio-temporal resolution modulation for videos, and spectrogram modulation for audio. In addition to faster training with reduced overfitting, we find that ICPC enhances resilience to input compression during inference. Therefore, we introduce variable-effort inference schemes for accurate and efficient inference. On 9 diverse tasks spanning 4 different modalities, ICPC improves accuracy by up to 1%, while also accelerating training and inference by up to 2.9 $\times $ × and 2.6 $\times $ × , respectively. Code is available at https://github.com/amrnag/ICPC .

Cite

Text

Nagarajan and Raghunathan. "Input Compression with Positional Consistency for Efficient Training and Inference of Transformer Neural Networks." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024. doi:10.1007/978-3-031-70362-1_5

Markdown

[Nagarajan and Raghunathan. "Input Compression with Positional Consistency for Efficient Training and Inference of Transformer Neural Networks." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024.](https://mlanthology.org/ecmlpkdd/2024/nagarajan2024ecmlpkdd-input/) doi:10.1007/978-3-031-70362-1_5

BibTeX

@inproceedings{nagarajan2024ecmlpkdd-input,
  title     = {{Input Compression with Positional Consistency for Efficient Training and Inference of Transformer Neural Networks}},
  author    = {Nagarajan, Amrit and Raghunathan, Anand},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2024},
  pages     = {73-88},
  doi       = {10.1007/978-3-031-70362-1_5},
  url       = {https://mlanthology.org/ecmlpkdd/2024/nagarajan2024ecmlpkdd-input/}
}