Memory-Efficient Reinforcement Learning with Value-Based Knowledge Consolidation

Qingfeng Lan, Yangchen Pan, Jun Luo, A. Rupam Mahmood

TMLR 2023

/tmlr/2023/lan2023tmlr-memoryefficient/

Abstract

Artificial neural networks are promising for general function approximation but challenging to train on non-independent or non-identically distributed data due to catastrophic forgetting. The experience replay buffer, a standard component in deep reinforcement learning, is often used to reduce forgetting and improve sample efficiency by storing experiences in a large buffer and using them for training later. However, a large replay buffer results in a heavy memory burden, especially for onboard and edge devices with limited memory capacities. We propose memory-efficient reinforcement learning algorithms based on the deep Q-network algorithm to alleviate this problem. Our algorithms reduce forgetting and maintain high sample efficiency by consolidating knowledge from the target Q-network to the current Q-network. Compared to baseline methods, our algorithms achieve comparable or better performance in both feature-based and image-based tasks while easing the burden of large experience replay buffers.

PDF TMLR Code Semantic Scholar

Cite

Text

Lan et al. "Memory-Efficient Reinforcement Learning with Value-Based Knowledge Consolidation." Transactions on Machine Learning Research, 2023.

Markdown

[Lan et al. "Memory-Efficient Reinforcement Learning with Value-Based Knowledge Consolidation." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/lan2023tmlr-memoryefficient/)

BibTeX

@article{lan2023tmlr-memoryefficient,
  title     = {{Memory-Efficient Reinforcement Learning with Value-Based Knowledge Consolidation}},
  author    = {Lan, Qingfeng and Pan, Yangchen and Luo, Jun and Mahmood, A. Rupam},
  journal   = {Transactions on Machine Learning Research},
  year      = {2023},
  url       = {https://mlanthology.org/tmlr/2023/lan2023tmlr-memoryefficient/}
}