SpOiLer: Offline Reinforcement Learning Using Scaled Penalties

Abstract

Offline Reinforcement Learning (RL) is a variant of off-policy learning where an optimal policy must be learned from a static dataset containing trajectories collected by an unknown behavior policy. In the offline setting, standard off-policy algorithms will overestimate values of out-of-distribution actions and a policy trained naively in this way will perform poorly in the environment due to distribution shift between the implied and real environment; this is especially likely when modelling complex and multi-modal data distributions. We propose Scaled-penalty Offline Learning (SpOiLer), an offline reinforcement learning algorithm that reduces the value of out-of-distribution actions relative to observed actions. The resultant pessimistic value function is a lower bound of the true value function and manipulates the policy towards selecting actions present in the dataset. Our method is a simple augmentation to the standard Bellman backup operator and implementation requires around 15 additional lines of code over soft actor-critic. We provide theoretical insights into how SpOiLer operates under the hood and show empirically that SpOiLer achieves remarkable performance against prior methods on a range of tasks.

Cite

Text

Srinivasan and Knottenbelt. "SpOiLer: Offline Reinforcement Learning Using Scaled Penalties." Proceedings of the 6th Annual Learning for Dynamics & Control Conference, 2024.

Markdown

[Srinivasan and Knottenbelt. "SpOiLer: Offline Reinforcement Learning Using Scaled Penalties." Proceedings of the 6th Annual Learning for Dynamics & Control Conference, 2024.](https://mlanthology.org/l4dc/2024/srinivasan2024l4dc-spoiler/)

BibTeX

@inproceedings{srinivasan2024l4dc-spoiler,
  title     = {{SpOiLer: Offline Reinforcement Learning Using Scaled Penalties}},
  author    = {Srinivasan, Padmanaba and Knottenbelt, William J.},
  booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference},
  year      = {2024},
  pages     = {825-838},
  volume    = {242},
  url       = {https://mlanthology.org/l4dc/2024/srinivasan2024l4dc-spoiler/}
}