Adaptive $q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning
Abstract
Offline reinforcement learning (RL) has progressed with return-conditioned supervised learning (RCSL), but its lack of stitching ability remains a limitation. We introduce $Q$-Aided Conditional Supervised Learning (QCS), which effectively combines the stability of RCSL with the stitching capability of $Q$-functions. By analyzing $Q$-function over-generalization, which impairs stable stitching, QCS adaptively integrates $Q$-aid into RCSL's loss function based on trajectory return. Empirical results show that QCS significantly outperforms RCSL and value-based methods, consistently achieving or exceeding the highest trajectory returns across diverse offline RL benchmarks. QCS represents a breakthrough in offline RL, pushing the limits of what can be achieved and fostering further innovations.
Cite
Text
Kim et al. "Adaptive $q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning." Neural Information Processing Systems, 2024. doi:10.52202/079017-2764Markdown
[Kim et al. "Adaptive $q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/kim2024neurips-adaptive-a/) doi:10.52202/079017-2764BibTeX
@inproceedings{kim2024neurips-adaptive-a,
title = {{Adaptive $q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning}},
author = {Kim, Jeonghye and Lee, Suyoung and Kim, Woojun and Sung, Youngchul},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-2764},
url = {https://mlanthology.org/neurips/2024/kim2024neurips-adaptive-a/}
}