Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-Session Agents

Du, Yiming; Wang, Baojun; Xiang, Yifan; Wang, Zhaowei; Huang, Wenyu; Xue, Boyang; Liang, Bin; Zeng, Xingshan; Mi, Fei; Bai, Haoli; Shang, Lifeng; Pan, Jeff Z.; Jiang, Yuxin; Wong, Kam-Fai

Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-Session Agents

Yiming Du, Baojun Wang, Yifan Xiang, Zhaowei Wang, Wenyu Huang, Boyang Xue, Bin Liang, Xingshan Zeng, Fei Mi, Haoli Bai, Lifeng Shang, Jeff Z. Pan, Yuxin Jiang, Kam-Fai Wong

ICLR 2026

/iclr/2026/du2026iclr-memoryt1/

Abstract

Temporal reasoning over long, multi-session dialogues is a critical capability for conversational agents. As dialogue histories grow in length and accumulate noise, existing long-context models struggle to accurately identify temporally pertinent information, significantly impairing reasoning performance. To address this, we introduce **Memory-T1**, a framework that learns a time-aware memory selection policy using reinforcement learning (RL). It employs a coarse-to-fine strategy, first pruning the dialogue history into a candidate set with temporal and retriever filters, followed by an RL agent that selects the precise evidence. The RL training is guided by a multi-level reward function optimizing (i) accuracy, (ii) evidence grounding, and (iii) temporal consistency. This temporal consistency reward provides a dense signal by evaluating alignment at both the session-level (range proximity) and the utterance-level (evidence density), enabling the agent to resolve subtle chronological ambiguities. On the Time-Dialog benchmark, Memory-T1 boosts a 7B model to an overall score of 67.0\%, establishing a new state-of-the-art performance for open-source models and outperforming a 14B baseline by 10.2\%. Ablation studies show temporal consistency and evidence grounding rewards jointly contributing to a 15.0\% performance gain. Moreover, Memory-T1 maintains robustness up to 128k tokens, where baseline models collapse, proving effectiveness against noise in extensive dialogue histories.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Du et al. "Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-Session Agents." International Conference on Learning Representations, 2026.

Markdown

[Du et al. "Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-Session Agents." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/du2026iclr-memoryt1/)

BibTeX

@inproceedings{du2026iclr-memoryt1,
  title     = {{Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-Session Agents}},
  author    = {Du, Yiming and Wang, Baojun and Xiang, Yifan and Wang, Zhaowei and Huang, Wenyu and Xue, Boyang and Liang, Bin and Zeng, Xingshan and Mi, Fei and Bai, Haoli and Shang, Lifeng and Pan, Jeff Z. and Jiang, Yuxin and Wong, Kam-Fai},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/du2026iclr-memoryt1/}
}