Episodic Memories Generation and Evaluation Benchmark for Large Language Models
Abstract
Episodic memory -- the ability to recall specific events grounded in time and space -- is a cornerstone of human cognition, enabling not only coherent storytelling, but also planning and decision-making. Despite their remarkable capabilities, Large Language Models (LLMs) lack a robust mechanism for episodic memory: we argue that integrating episodic memory capabilities into LLM is essential for advancing AI towards human-like cognition, increasing their potential to reason consistently and ground their output in real-world episodic events, hence avoiding confabulations. To address this challenge, we introduce a comprehensive framework to model and evaluate LLM episodic memory capabilities. Drawing inspiration from cognitive science, we develop a structured approach to represent episodic events, encapsulating temporal and spatial contexts, involved entities, and detailed descriptions. We synthesize a unique episodic memory benchmark, free from contamination, and release open source code and datasets to assess LLM performance across various recall and episodic reasoning tasks. Our evaluation of state-of-the-art models, including GPT-4 and Claude variants, Llama 3.1, and o1-mini, reveals that even the most advanced LLMs struggle with episodic memory tasks, particularly when dealing with multiple related events or complex spatio-temporal relationships -- even in contexts as short as 10k-100k tokens.
Cite
Text
Huet et al. "Episodic Memories Generation and Evaluation Benchmark for Large Language Models." International Conference on Learning Representations, 2025.Markdown
[Huet et al. "Episodic Memories Generation and Evaluation Benchmark for Large Language Models." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/huet2025iclr-episodic/)BibTeX
@inproceedings{huet2025iclr-episodic,
title = {{Episodic Memories Generation and Evaluation Benchmark for Large Language Models}},
author = {Huet, Alexis and Houidi, Zied Ben and Rossi, Dario},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/huet2025iclr-episodic/}
}