Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network

Abstract

Reinforcement learning (RL) for continuous control often requires large amounts of online interaction data. Value-based RL methods can mitigate this burden by offering relatively high sample efficiency. Some studies further enhance sample efficiency by incorporating offline demonstration data to “kick-start” training, achieving promising results in continuous control. However, they typically compute the Q-function independently for each action dimension, neglecting interdependencies and making it harder to identify optimal actions when learning from suboptimal data, such as non-expert demonstration and online-collected data during the training process. To address these issues, we propose Auto-Regressive Soft Q-learning (ARSQ), a value-based RL algorithm that models Q-values in a coarse-to-fine, auto-regressive manner. First, ARSQ decomposes the continuous action space into discrete spaces in a coarse-to-fine hierarchy, enhancing sample efficiency for fine-grained continuous control tasks. Next, it auto-regressively predicts dimensional action advantages within each decision step, enabling more effective decision-making in continuous control tasks. We evaluate ARSQ on two continuous control benchmarks, RLBench and D4RL, integrating demonstration data into online training. On D4RL, which includes non-expert demonstrations, ARSQ achieves an average 1.62$\times$ performance improvement over SOTA value-based baseline. On RLBench, which incorporates expert demonstrations, ARSQ surpasses various baselines, demonstrating its effectiveness in learning from suboptimal online-collected data.

Cite

Text

Liu et al. "Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Liu et al. "Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/liu2025icml-learning-c/)

BibTeX

@inproceedings{liu2025icml-learning-c,
  title     = {{Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network}},
  author    = {Liu, Jijia and Gao, Feng and Liao, Qingmin and Yu, Chao and Wang, Yu},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {38467-38488},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/liu2025icml-learning-c/}
}