Calibration Enhanced Decision Maker: Towards Trustworthy Sequential Decision-Making with Large Sequence Models
Abstract
Offline deep reinforcement learning (offline DRL) has attracted considerable attention across various domains due to its ability to learn effective policies without direct environmental interaction. Although highly effective, the trustworthiness of agent concerns the community. Offline DRL can be categorized into three principal paradigms: model-based algorithms, model-free algorithms, and trajectory optimization. While extant research predominantly concentrates on calibration enhancement of model-based and model-free algorithms, calibration of trajectory optimization remains a rather rare topic. In this paper, we introduce the concept of Expected Agent Calibration Error (EACE), a novel metric designed to assess agent calibration. Furthermore, we rigorously prove its theoretical relationship to the state-action marginal distribution distance. Subsequently, we introduce the Calibration Enhanced Decision Maker (CEDM), which employs a binning executor to process feature distribution histograms as input for the large sequence model, thereby minimizing the state-action marginal distribution distance and enhancing the agent's calibration. A series of in-depth case studies of CEDM are carried out, with application on Decision Transformer, Decision ConvFormer, and Decision Mamba. Empirical results substantiate the robustness of EACE and demonstrate the effectiveness of CEDM in enhancing agent calibration, thereby offering valuable insights for future research on trustworthy sequential decision-making.
Cite
Text
Sun et al. "Calibration Enhanced Decision Maker: Towards Trustworthy Sequential Decision-Making with Large Sequence Models." Transactions on Machine Learning Research, 2026.Markdown
[Sun et al. "Calibration Enhanced Decision Maker: Towards Trustworthy Sequential Decision-Making with Large Sequence Models." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/sun2026tmlr-calibration/)BibTeX
@article{sun2026tmlr-calibration,
title = {{Calibration Enhanced Decision Maker: Towards Trustworthy Sequential Decision-Making with Large Sequence Models}},
author = {Sun, Haoyuan and Xia, Bo and Luo, Yifu and Zhang, Tiantian and Wang, Xueqian},
journal = {Transactions on Machine Learning Research},
year = {2026},
url = {https://mlanthology.org/tmlr/2026/sun2026tmlr-calibration/}
}