Stationary Deep Reinforcement Learning with Quantum K-Spin Hamiltonian Regularization

Abstract

Instability is a major issue of deep reinforcement learning (DRL) algorithms --- high variance of performance over multiple runs. It is mainly caused by the existence of many local minima and worsened by the multiple fixed points issue of Bellman's equation. As a fix, we propose a quantum K-spin Hamiltonian regularization term (called H-term) to help a policy network converge to a high-quality local minimum. First, we take a quantum perspective by modeling a policy as a K-spin Ising model and employ a Hamiltonian to measure the energy of a policy. Then, we derive a novel Hamiltonian policy gradient theorem and design a generic actor-critic algorithm that utilizes the H-term to regularize the policy network. Finally, the proposed method reduces the variance of cumulative rewards by 65.2% ~ 85.6% on six MuJoCo tasks, compared with existing algorithms over 20 runs.

Cite

Text

Liu et al. "Stationary Deep Reinforcement Learning with Quantum K-Spin Hamiltonian Regularization." ICLR 2023 Workshops: Physics4ML, 2023.

Markdown

[Liu et al. "Stationary Deep Reinforcement Learning with Quantum K-Spin Hamiltonian Regularization." ICLR 2023 Workshops: Physics4ML, 2023.](https://mlanthology.org/iclrw/2023/liu2023iclrw-stationary/)

BibTeX

@inproceedings{liu2023iclrw-stationary,
  title     = {{Stationary Deep Reinforcement Learning with Quantum K-Spin Hamiltonian Regularization}},
  author    = {Liu, Xiao-Yang and Li, Zechu and Wu, Shixun and Wang, Xiaodong},
  booktitle = {ICLR 2023 Workshops: Physics4ML},
  year      = {2023},
  url       = {https://mlanthology.org/iclrw/2023/liu2023iclrw-stationary/}
}