Learned Meta-Tokens for Language Modeling

Shah, Alok; Gupta, Khush; Ramji, Keshav; Chaudhari, Pratik

Learned Meta-Tokens for Language Modeling

Alok Shah, Khush Gupta, Keshav Ramji, Pratik Chaudhari

ICLR 2026

/iclr/2026/shah2026iclr-learned/

Abstract

Transformer-based language models (LMs) notably struggle to reliably capture distant contextual information. This work introduces a novel approach using meta-tokens -- special tokens injected during pre-training -- paired with a dedicated meta-attention mechanism to guide LMs to use these tokens. We pre-train a language model equipped with meta-attention in addition to causal multi-head attention on <100B tokens, achieving strong performance on a suite of synthetic tasks. Our method facilitates length generalization up to 2$\times$ the context window after extension with YaRN. We provide an information-theoretic analysis which reveals that meta-tokens \textit{sharpen} the positional encoding, allowing them to operate as content-based anchors that compress preceding context and “cache” it within the meta-token. We empirically confirm this by visualizing model internals to study the residual stream. Together, our findings demonstrate that meta-tokens and meta-attention provide a simple, data-efficient pre-training method, grounded by new mechanistic insights into their role in enabling length generalization behavior.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Shah et al. "Learned Meta-Tokens for Language Modeling." International Conference on Learning Representations, 2026.

Markdown

[Shah et al. "Learned Meta-Tokens for Language Modeling." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/shah2026iclr-learned/)

BibTeX

@inproceedings{shah2026iclr-learned,
  title     = {{Learned Meta-Tokens for Language Modeling}},
  author    = {Shah, Alok and Gupta, Khush and Ramji, Keshav and Chaudhari, Pratik},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/shah2026iclr-learned/}
}