All-Time Safety and Sample-Efficient Meta Update for Online Safe Meta Reinforcement Learning Under Markov Task Transition

Yuan, Zhenyuan; Xu, Siyuan; Zhu, Minghui

doi:10.1007/S10994-025-06810-4

All-Time Safety and Sample-Efficient Meta Update for Online Safe Meta Reinforcement Learning Under Markov Task Transition

Zhenyuan Yuan, Siyuan Xu, Minghui Zhu

MLJ 2025 pp. 173

doi:10.1007/S10994-025-06810-4 /mlj/2025/yuan2025mlj-alltime/

Abstract

Abstract This paper studies the issues of ensuring all-time safety and sample-efficient meta update in online safe meta reinforcement learning (MRL) on physical agents (e.g., mobile robots). We propose a novel masked Follow-the-Last-Parameter-Policy (FTLPP) framework, which is composed of a policy masking framework and a sample-efficient online meta update method. The policy masking framework applies a masking function over the learned control policy and ensures all-time safety by suppressing the probability of executing unsafe actions to a sufficiently small value. To enhance sample efficiency, the problem of online update of the meta parameter is transformed into a policy optimization problem, where the tasks are the states and the meta parameters for the next task are the actions, and then is solved using an off-policy reinforcement learning algorithm. We evaluate our method on Frozen Lake, Acrobot, Half Cheetah and Hopper from OpenAI gym and compare it with baseline methods Meta SRL and the variants of FTML and SAILR.

PDF MLJ Semantic Scholar

Cite

Text

Yuan et al. "All-Time Safety and Sample-Efficient Meta Update for Online Safe Meta Reinforcement Learning Under Markov Task Transition." Machine Learning, 2025. doi:10.1007/S10994-025-06810-4

Markdown

[Yuan et al. "All-Time Safety and Sample-Efficient Meta Update for Online Safe Meta Reinforcement Learning Under Markov Task Transition." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/yuan2025mlj-alltime/) doi:10.1007/S10994-025-06810-4

BibTeX

@article{yuan2025mlj-alltime,
  title     = {{All-Time Safety and Sample-Efficient Meta Update for Online Safe Meta Reinforcement Learning Under Markov Task Transition}},
  author    = {Yuan, Zhenyuan and Xu, Siyuan and Zhu, Minghui},
  journal   = {Machine Learning},
  year      = {2025},
  pages     = {173},
  doi       = {10.1007/S10994-025-06810-4},
  volume    = {114},
  url       = {https://mlanthology.org/mlj/2025/yuan2025mlj-alltime/}
}