Reawakening Knowledge: Anticipatory Recovery from Catastrophic Interference via Structured Training

Abstract

We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs finetuned sequentially in this setting: they exhibit anticipatory behavior, recovering from the forgetting on documents before seeing them again. The behavior emerges and becomes more robust as the architecture scales up its number of parameters. Through comprehensive experiments and visualizations, we uncover new insights into training over-parameterized networks in structured environments.

Cite

Text

Yang et al. "Reawakening Knowledge: Anticipatory Recovery from Catastrophic Interference via Structured Training." Neural Information Processing Systems, 2024. doi:10.52202/079017-2621

Markdown

[Yang et al. "Reawakening Knowledge: Anticipatory Recovery from Catastrophic Interference via Structured Training." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/yang2024neurips-reawakening/) doi:10.52202/079017-2621

BibTeX

@inproceedings{yang2024neurips-reawakening,
  title     = {{Reawakening Knowledge: Anticipatory Recovery from Catastrophic Interference via Structured Training}},
  author    = {Yang, Yanlai and Jones, Matt and Mozer, Michael C. and Ren, Mengye},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2621},
  url       = {https://mlanthology.org/neurips/2024/yang2024neurips-reawakening/}
}