Revisiting Discrete Soft Actor-Critic

Abstract

We study the adaption of Soft Actor-Critic (SAC), which is considered as a state-of-the-art reinforcement learning (RL) algorithm, from continuous action space to discrete action space. We revisit vanilla discrete SAC and provide an in-depth understanding of its Q value underestimation and performance instability issues when applied to discrete settings. We thereby propose Stable Discrete SAC (SDSAC), an algorithm that leverages entropy-penalty and double average Q-learning with Q-clip to address these issues. Extensive experiments on typical benchmarks with discrete action space, including Atari games and a large-scale MOBA game, show the efficacy of our proposed method. Our code is at: https://github.com/coldsummerday/SD-SAC.git.

Cite

Text

Zhou et al. "Revisiting Discrete Soft Actor-Critic." Transactions on Machine Learning Research, 2024.

Markdown

[Zhou et al. "Revisiting Discrete Soft Actor-Critic." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/zhou2024tmlr-revisiting/)

BibTeX

@article{zhou2024tmlr-revisiting,
  title     = {{Revisiting Discrete Soft Actor-Critic}},
  author    = {Zhou, Haibin and Wei, Tong and Lin, Zichuan and Li, Junyou and Xing, Junliang and Shi, Yuanchun and Shen, Li and Yu, Chao and Ye, Deheng},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/zhou2024tmlr-revisiting/}
}