DeepMellow: Removing the Need for a Target Network in Deep Q-Learning

Abstract

Deep Q-Network (DQN) is an algorithm that achieves human-level performance in complex domains like Atari games. One of the important elements of DQN is its use of a target network, which is necessary to stabilize learning. We argue that using a target network is incompatible with online reinforcement learning, and it is possible to achieve faster and more stable learning without a target network when we use Mellowmax, an alternative softmax operator. We derive novel properties of Mellowmax, and empirically show that the combination of DQN and Mellowmax, but without a target network, outperforms DQN with a target network.

Cite

Text

Kim et al. "DeepMellow: Removing the Need for a Target Network in Deep Q-Learning." International Joint Conference on Artificial Intelligence, 2019. doi:10.24963/IJCAI.2019/379

Markdown

[Kim et al. "DeepMellow: Removing the Need for a Target Network in Deep Q-Learning." International Joint Conference on Artificial Intelligence, 2019.](https://mlanthology.org/ijcai/2019/kim2019ijcai-deepmellow/) doi:10.24963/IJCAI.2019/379

BibTeX

@inproceedings{kim2019ijcai-deepmellow,
  title     = {{DeepMellow: Removing the Need for a Target Network in Deep Q-Learning}},
  author    = {Kim, Seungchan and Asadi, Kavosh and Littman, Michael L. and Konidaris, George Dimitri},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {2733-2739},
  doi       = {10.24963/IJCAI.2019/379},
  url       = {https://mlanthology.org/ijcai/2019/kim2019ijcai-deepmellow/}
}