Over-Training with Mixup May Hurt Generalization

Zixuan Liu, Ziqiao Wang, Hongyu Guo, Yongyi Mao

NeurIPSW 2022

/neuripsw/2022/liu2022neuripsw-overtraining/

Abstract

Mixup, which creates synthetic training instances by linearly interpolating random sample pairs, is a simple yet effective regularization technique to boost the performance of deep models trained with SGD. In this work, we report a previously unobserved phenomenon in Mixup training: on a number of standard datasets, the performance of Mixup-trained models starts to decay after training for a large number of epochs, giving rise to a U-shaped generalization curve. This behavior is further aggravated when the size of the original dataset is reduced. To help understand such a behavior of Mixup, we show theoretically that Mixup training may introduce undesired data-dependent label noises to the synthesized data. Via analyzing a least-square regression problem with a random feature model, we explain why noisy labels may cause the U-shaped curve to occur: Mixup improves generalization through fitting the clean patterns at the early training stage, but as training progresses, Mixup becomes over-fitting to the noise in the synthetic data.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Liu et al. "Over-Training with Mixup May Hurt Generalization." NeurIPS 2022 Workshops: INTERPOLATE, 2022.

Markdown

[Liu et al. "Over-Training with Mixup May Hurt Generalization." NeurIPS 2022 Workshops: INTERPOLATE, 2022.](https://mlanthology.org/neuripsw/2022/liu2022neuripsw-overtraining/)

BibTeX

@inproceedings{liu2022neuripsw-overtraining,
  title     = {{Over-Training with Mixup May Hurt Generalization}},
  author    = {Liu, Zixuan and Wang, Ziqiao and Guo, Hongyu and Mao, Yongyi},
  booktitle = {NeurIPS 2022 Workshops: INTERPOLATE},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/liu2022neuripsw-overtraining/}
}