Exploring Monotonicity in Early-Exiting Language Models
Abstract
Large Language Models (LLMs) have shown impressive results across the board, but inference can be costly. A promising solution is posed by early exiting methods that assume that not all tokens need the same amount of computation, exiting the LLM at earlier layers. Several early exiting methods have been proposed, which rely on the implicit assumption that as the network does more computation, it will become more confident in its prediction. We investigate this assumption for two early exiting methods and propose three new confidence measures for early exiting based on the insights. We find early evidence for monotonicity benefitting the quality of token generation.
Cite
Text
Laitenberger et al. "Exploring Monotonicity in Early-Exiting Language Models." ICML 2024 Workshops: ES-FoMo-II, 2024.Markdown
[Laitenberger et al. "Exploring Monotonicity in Early-Exiting Language Models." ICML 2024 Workshops: ES-FoMo-II, 2024.](https://mlanthology.org/icmlw/2024/laitenberger2024icmlw-exploring/)BibTeX
@inproceedings{laitenberger2024icmlw-exploring,
title = {{Exploring Monotonicity in Early-Exiting Language Models}},
author = {Laitenberger, Filipe and Belitsky, Max and Sheremet, Denys},
booktitle = {ICML 2024 Workshops: ES-FoMo-II},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/laitenberger2024icmlw-exploring/}
}