Emergent Properties with Repeated Examples
Abstract
We study the performance of transformers as a function of the number of repetitions of training examples with algorithmically generated datasets. On three problems of mathematics: the greatest common divisor, modular multiplication, and matrix eigenvalues, we show that for a fixed number of training steps, models trained on smaller sets of repeated examples outperform models trained on larger sets of single-use examples. We also demonstrate that {\em two-set training} - repeated use of a small random subset of examples, along normal sampling on the rest of the training set - provides for faster learning and better performance. This highlights that the benefits of repetition can outweigh those of data diversity. These datasets and problems provide a controlled setting to shed light on the still poorly understood interplay between generalization and memorization in deep learning.
Cite
Text
Charton and Kempe. "Emergent Properties with Repeated Examples." NeurIPS 2024 Workshops: SciForDL, 2024.Markdown
[Charton and Kempe. "Emergent Properties with Repeated Examples." NeurIPS 2024 Workshops: SciForDL, 2024.](https://mlanthology.org/neuripsw/2024/charton2024neuripsw-emergent/)BibTeX
@inproceedings{charton2024neuripsw-emergent,
title = {{Emergent Properties with Repeated Examples}},
author = {Charton, Francois and Kempe, Julia},
booktitle = {NeurIPS 2024 Workshops: SciForDL},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/charton2024neuripsw-emergent/}
}