Tempo Adaptation in Non-Stationary Reinforcement Learning
Abstract
We first raise and tackle a ``time synchronization'' issue between the agent and the environment in non-stationary reinforcement learning (RL), a crucial factor hindering its real-world applications. In reality, environmental changes occur over wall-clock time ($t$) rather than episode progress ($k$), where wall-clock time signifies the actual elapsed time within the fixed duration $t \in [0, T]$. In existing works, at episode $k$, the agent rolls a trajectory and trains a policy before transitioning to episode $k+1$. In the context of the time-desynchronized environment, however, the agent at time $t_{k}$ allocates $\Delta t$ for trajectory generation and training, subsequently moves to the next episode at $t_{k+1}=t_{k}+\Delta t$. Despite a fixed total number of episodes ($K$), the agent accumulates different trajectories influenced by the choice of interaction times ($t_1,t_2,...,t_K$), significantly impacting the suboptimality gap of the policy. We propose a Proactively Synchronizing Tempo ($\texttt{ProST}$) framework that computes a suboptimal sequence {$t_1,t_2,...,t_K$} (= { $t_{1:K}$}) by minimizing an upper bound on its performance measure, i.e., the dynamic regret. Our main contribution is that we show that a suboptimal {$t_{1:K}$} trades-off between the policy training time (agent tempo) and how fast the environment changes (environment tempo). Theoretically, this work develops a suboptimal {$t_{1:K}$} as a function of the degree of the environment's non-stationarity while also achieving a sublinear dynamic regret. Our experimental evaluation on various high-dimensional non-stationary environments shows that the $\texttt{ProST}$ framework achieves a higher online return at suboptimal {$t_{1:K}$} than the existing methods.
Cite
Text
Lee et al. "Tempo Adaptation in Non-Stationary Reinforcement Learning." Neural Information Processing Systems, 2023.Markdown
[Lee et al. "Tempo Adaptation in Non-Stationary Reinforcement Learning." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/lee2023neurips-tempo/)BibTeX
@inproceedings{lee2023neurips-tempo,
title = {{Tempo Adaptation in Non-Stationary Reinforcement Learning}},
author = {Lee, Hyunin and Ding, Yuhao and Lee, Jongmin and Jin, Ming and Lavaei, Javad and Sojoudi, Somayeh},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/lee2023neurips-tempo/}
}