Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability

Abstract

Translating the internal representations and computations of models into concepts that humans can understand is a key goal of interpretability. While recent dictionary learning methods such as Sparse Autoencoders (SAEs) provide a promising route to discover human-interpretable features, they often only recover token-specific, noisy, or highly local concepts. We argue that this limitation stems from neglecting the temporal structure of language, where semantic content typically evolves smoothly over sequences. Building on this insight, we introduce Temporal Sparse Autoencoders (T-SAEs), which incorporate a novel contrastive loss encouraging consistent activations of high-level features over adjacent tokens. This simple yet powerful modification enables SAEs to disentangle semantic from syntactic features in a self-supervised manner. Across multiple datasets and models, T-SAEs recover smoother, more coherent semantic concepts without sacrificing reconstruction quality. Strikingly, they exhibit clear semantic structure despite being trained without explicit semantic signal, offering a new pathway for unsupervised interpretability in language models.

Cite

Text

Bhalla et al. "Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability." International Conference on Learning Representations, 2026.

Markdown

[Bhalla et al. "Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/bhalla2026iclr-temporal/)

BibTeX

@inproceedings{bhalla2026iclr-temporal,
  title     = {{Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability}},
  author    = {Bhalla, Usha and Oesterling, Alex and Verdun, Claudio Mayrink and Lakkaraju, Himabindu and Calmon, Flavio},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/bhalla2026iclr-temporal/}
}