CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity

Bhatt, Aditya; Palenicek, Daniel; Belousov, Boris; Argus, Max; Amiranashvili, Artemij; Brox, Thomas; Peters, Jan

CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity

Aditya Bhatt, Daniel Palenicek, Boris Belousov, Max Argus, Artemij Amiranashvili, Thomas Brox, Jan Peters

ICLR 2024

/iclr/2024/bhatt2024iclr-crossq/

Abstract

Sample efficiency is a crucial problem in deep reinforcement learning. Recent algorithms, such as REDQ and DroQ, found a way to improve the sample efficiency by increasing the update-to-data (UTD) ratio to 20 gradient update steps on the critic per environment sample. However, this comes at the expense of a greatly increased computational cost. To reduce this computational burden, we introduce CrossQ: A lightweight algorithm for continuous control tasks that makes careful use of Batch Normalization and removes target networks to surpass the current state-of-the-art in sample efficiency while maintaining a low UTD ratio of 1. Notably, CrossQ does not rely on advanced bias-reduction schemes used in current methods. CrossQ's contributions are threefold: (1) it matches or surpasses current state-of-the-art methods in terms of sample efficiency, (2) it substantially reduces the computational cost compared to REDQ and DroQ, (3) it is easy to implement, requiring just a few lines of code on top of SAC.

PDF ICLR Semantic Scholar

Cite

Text

Bhatt et al. "CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity." International Conference on Learning Representations, 2024.

Markdown

[Bhatt et al. "CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/bhatt2024iclr-crossq/)

BibTeX

@inproceedings{bhatt2024iclr-crossq,
  title     = {{CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity}},
  author    = {Bhatt, Aditya and Palenicek, Daniel and Belousov, Boris and Argus, Max and Amiranashvili, Artemij and Brox, Thomas and Peters, Jan},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/bhatt2024iclr-crossq/}
}