Robust Adaptive Multi-Step Predictive Shielding
Abstract
Reinforcement learning for safety-critical tasks requires policies that are both high-performing and safe throughout the learning process. While model-predictive shielding is a promising approach, existing methods are often computationally intractable for the high-dimensional, nonlinear systems where deep RL excels, as they typically rely on a patchwork of local models. We introduce **RAMPS**, a scalable shielding framework that overcomes this limitation by leveraging a learned, linear representation of the environment's dynamics. This model can range from a linear regression in the original state space to a more complex operator learned in a high-dimensional feature space. The key is that this linear structure enables a robust, look-ahead safety technique based on a *multi-step Control Barrier Function (CBF)*. By moving beyond myopic one-step formulations, **RAMPS** accounts for model error and control delays to provide reliable, real-time interventions. The resulting framework is minimally invasive, computationally efficient, and built upon robust control-theoretic foundations. Our experiments demonstrate that **RAMPS** significantly reduces safety violations compared to existing safe RL methods while maintaining high task performance in complex control environments.
Cite
Text
Ambadkar et al. "Robust Adaptive Multi-Step Predictive Shielding." International Conference on Learning Representations, 2026.Markdown
[Ambadkar et al. "Robust Adaptive Multi-Step Predictive Shielding." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/ambadkar2026iclr-robust/)BibTeX
@inproceedings{ambadkar2026iclr-robust,
title = {{Robust Adaptive Multi-Step Predictive Shielding}},
author = {Ambadkar, Tanmay and Chudiwal, Darshan and Anderson, Greg and Verma, Abhinav},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/ambadkar2026iclr-robust/}
}