State Revisit and Re-Explore: Bridging Sim-to-Real Gaps in Offline-and-Online Reinforcement Learning with an Imperfect Simulator
Abstract
In reinforcement learning (RL) based robot skill acquisition, a high-fidelity simulator is usually indispensable but unattainable since the real environment dynamics are difficult to model, which leads to severe sim-to-real gaps. Existing methods solve this problem by combining offline and online RL to jointly learn transferable policies from limited offline data and imperfect simulators. However, due to the unrestricted exploration in the imperfect simulator, the hybrid offline-and-online RL methods inevitably suffer from low sample efficiency and insufficient state-action space coverage during training. To solve this problem, we propose a State Revisit and Re-exploration (SR2) hybrid offline-and-online RL framework. In particular, the proposed algorithm employs a meta-policy and a sub-policy, where the meta-policy aims to find high-quality states in the offline trajectories for online exploration, and the sub-policy learns the robot skill using mixed offline and online data. By introducing the state revisit and explore mechanism, our approach efficiently improves performance on a set of sim-to-real robotic tasks. Through extensive simulation and real-world tasks, we demonstrate the superior performance of our approach against other state-of-the-art methods.
Cite
Text
Chen et al. "State Revisit and Re-Explore: Bridging Sim-to-Real Gaps in Offline-and-Online Reinforcement Learning with an Imperfect Simulator." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/970Markdown
[Chen et al. "State Revisit and Re-Explore: Bridging Sim-to-Real Gaps in Offline-and-Online Reinforcement Learning with an Imperfect Simulator." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/chen2025ijcai-state/) doi:10.24963/IJCAI.2025/970BibTeX
@inproceedings{chen2025ijcai-state,
title = {{State Revisit and Re-Explore: Bridging Sim-to-Real Gaps in Offline-and-Online Reinforcement Learning with an Imperfect Simulator}},
author = {Chen, Xingyu and Xie, Jiayi and Xu, Zhijian and Liu, Ruixun and Yang, Shuai and Liu, Zeyang and Wan, Lipeng and Lan, Xuguang},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {8724-8732},
doi = {10.24963/IJCAI.2025/970},
url = {https://mlanthology.org/ijcai/2025/chen2025ijcai-state/}
}