Bayesian Online Non-Stationary Detection for Robust Reinforcement Learning

Abstract

Reinforcement Learning (RL) has achieved state-of-the-art performance in stationary environments with effective simulators. However, lifelong and open-world RL applications, such as robotics, stock trading, and recommendation systems, change over time in adversarial ways. Non-stationary environments pose challenges for RL agents due to constant distribution shifts from the training data, leading to deteriorating performance. We propose using a robust Bayesian online detector, which tracks agent performance to detect non-stationarities in the environment. Additionally, we propose a new metric called hindsight approximate reward (HAR) that solely relies on state and action information to detect adversarial changes in the environment, making it well-suited for real-world settings with missing or delayed feedback. We demonstrate that the proposed Bayesian detector, combined with HAR or expected reward as a metric, can detect a range of non-stationary changes in dynamic control tasks more effectively compared to baseline non-stationary tests.

Cite

Text

Shmakov et al. "Bayesian Online Non-Stationary Detection for Robust Reinforcement Learning." NeurIPS 2024 Workshops: IMOL, 2024.

Markdown

[Shmakov et al. "Bayesian Online Non-Stationary Detection for Robust Reinforcement Learning." NeurIPS 2024 Workshops: IMOL, 2024.](https://mlanthology.org/neuripsw/2024/shmakov2024neuripsw-bayesian/)

BibTeX

@inproceedings{shmakov2024neuripsw-bayesian,
  title     = {{Bayesian Online Non-Stationary Detection for Robust Reinforcement Learning}},
  author    = {Shmakov, Alexander and Rajak, Pankaj and Feng, Yuhao and Kowalinski, Wojciech and Wang, Fei},
  booktitle = {NeurIPS 2024 Workshops: IMOL},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/shmakov2024neuripsw-bayesian/}
}