LM2: Large Memory Models for Long Context Reasoning

Kang, Jikun; Wu, Wenqi; Christianos, Filippos; Chan, Alex James; Greenlee, Fraser David; Thomas, George; Purtorab, Marvin; Toulis, Andrew

LM2: Large Memory Models for Long Context Reasoning

Jikun Kang, Wenqi Wu, Filippos Christianos, Alex James Chan, Fraser David Greenlee, George Thomas, Marvin Purtorab, Andrew Toulis

ICLRW 2025

/iclrw/2025/kang2025iclrw-lm2/

Abstract

This paper introduces the Large Memory Model (LM2), a decoder-only Transformer architecture enhanced with an auxiliary memory module that aims to address the limitations of standard Transformers in multi-step reasoning, relational argumentation, and synthesizing information distributed over long contexts. The proposed LM2 incorporates a memory module that acts as a contextual representation repository, interacting with input tokens via cross attention and updating through gating mechanisms. To preserve the Transformers general-purpose capabilities, LM2 maintains the original information flow while integrating a complementary memory pathway. Experimental results on the BABILong benchmark demonstrate that the LM2model outperforms both the memory-augmented RMT model by 37.1% and the baseline Llama-3.2 model by 86.3% on average across tasks. LM2 exhibits exceptional capabilities in multi-hop inference, numerical reasoning, and large-context question-answering. On the MMLU dataset, it achieves a 5.0% improvement over a pre-trained vanilla model, demonstrating that its memory module does not degrade performance on general tasks. Further, in our analysis, we explore the memory interpretability, effectiveness of memory modules, and test-time behavior. Our findings emphasize the importance of explicit memory in enhancing Transformer architectures.

PDF ICLRW OpenReview Semantic Scholar

Cite

Text

Kang et al. "LM2: Large Memory Models for Long Context Reasoning." ICLR 2025 Workshops: LLM_Reason_and_Plan, 2025.

Markdown

[Kang et al. "LM2: Large Memory Models for Long Context Reasoning." ICLR 2025 Workshops: LLM_Reason_and_Plan, 2025.](https://mlanthology.org/iclrw/2025/kang2025iclrw-lm2/)

BibTeX

@inproceedings{kang2025iclrw-lm2,
  title     = {{LM2: Large Memory Models for Long Context Reasoning}},
  author    = {Kang, Jikun and Wu, Wenqi and Christianos, Filippos and Chan, Alex James and Greenlee, Fraser David and Thomas, George and Purtorab, Marvin and Toulis, Andrew},
  booktitle = {ICLR 2025 Workshops: LLM_Reason_and_Plan},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/kang2025iclrw-lm2/}
}