Learning Fast and Slow: Representations for In-Context Weight Modulation

Abstract

Most natural sequential processes involve a spectrum of different time scales: from fast-changing variations responsible for local structure to slowly-changing dynamics akin to memory that captures context information. Here we propose a method for learning such disentangled slow-fast representation in activations of a conventional Transformer model. We accomplish this by employing regularization techniques inspired by contrastive learning. This proposed approach can be further analyzed by adopting a Gaussian process prior resulting in a Variational Autoencoder interpretation of a Transformer model. We evaluate our techniques on synthetic in-context learning tasks and widely-used text benchmarks, where we show the emergence of disentangled representations. We then propose a HyperNetwork-inspired approach, where the slow representations are employed to modulate the weights of the transformer performed on the fast short-range activations. We demonstrate that adding such modulation makes it possible to generate models specialized to a particular context on the fly.

Cite

Text

Zhmoginov et al. "Learning Fast and Slow: Representations for In-Context Weight Modulation." ICML 2024 Workshops: ICL, 2024.

Markdown

[Zhmoginov et al. "Learning Fast and Slow: Representations for In-Context Weight Modulation." ICML 2024 Workshops: ICL, 2024.](https://mlanthology.org/icmlw/2024/zhmoginov2024icmlw-learning/)

BibTeX

@inproceedings{zhmoginov2024icmlw-learning,
  title     = {{Learning Fast and Slow: Representations for In-Context Weight Modulation}},
  author    = {Zhmoginov, Andrey and Lee, Jihwan and Vladymyrov, Max and Sandler, Mark},
  booktitle = {ICML 2024 Workshops: ICL},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/zhmoginov2024icmlw-learning/}
}