Q-Functionals for Value-Based Continuous Control

Abstract

We present Q-functionals, an alternative architecture for continuous control deep reinforcement learning. Instead of returning a single value for a state-action pair, our network transforms a state into a function that can be rapidly evaluated in parallel for many actions, allowing us to efficiently choose high-value actions through sampling. This contrasts with the typical architecture of off-policy continuous control, where a policy network is trained for the sole purpose of selecting actions from the Q-function. We represent our action-dependent Q-function as a weighted sum of basis functions (Fourier, Polynomial, etc) over the action space, where the weights are state-dependent and output by the Q-functional network. Fast sampling makes practical a variety of techniques that require Monte-Carlo integration over Q-functions, and enables action-selection strategies besides simple value-maximization. We characterize our framework, describe various implementations of Q-functionals, and demonstrate strong performance on a suite of continuous control tasks.

Cite

Text

Lobel et al. "Q-Functionals for Value-Based Continuous Control." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I7.26073

Markdown

[Lobel et al. "Q-Functionals for Value-Based Continuous Control." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/lobel2023aaai-q/) doi:10.1609/AAAI.V37I7.26073

BibTeX

@inproceedings{lobel2023aaai-q,
  title     = {{Q-Functionals for Value-Based Continuous Control}},
  author    = {Lobel, Samuel and Rammohan, Sreehari and He, Bowen and Yu, Shangqun and Konidaris, George},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {8932-8939},
  doi       = {10.1609/AAAI.V37I7.26073},
  url       = {https://mlanthology.org/aaai/2023/lobel2023aaai-q/}
}