Class-Aware Initialization of Early Exits for Pre-Training Large Language Models

Abstract

We propose a novel class-aware weight initialization technique for early exit large language models with the purpose of accelerating pre-training. Our design utilizes the neural collapse phenomenon combined with a Gaussian mixture model for the distribution of feature vectors at a given layer. Specifically, we calculate the average of token representations at the early exit point and use the resulting vectors together with class probabilities for initializing the early exit vectors. The next token prediction accuracy of our class-aware initialization technique is up to five times higher than other baselines at epoch zero and matches or surpasses them in later epochs throughout the pre-training process.

Cite

Text

Gormez and Koyuncu. "Class-Aware Initialization of Early Exits for Pre-Training Large Language Models." ICML 2024 Workshops: WANT, 2024.

Markdown

[Gormez and Koyuncu. "Class-Aware Initialization of Early Exits for Pre-Training Large Language Models." ICML 2024 Workshops: WANT, 2024.](https://mlanthology.org/icmlw/2024/gormez2024icmlw-classaware/)

BibTeX

@inproceedings{gormez2024icmlw-classaware,
  title     = {{Class-Aware Initialization of Early Exits for Pre-Training Large Language Models}},
  author    = {Gormez, Alperen and Koyuncu, Erdem},
  booktitle = {ICML 2024 Workshops: WANT},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/gormez2024icmlw-classaware/}
}