Quasipseudometric Value Functions with Dense Rewards

Abstract

As a generalization of reinforcement learning (RL) to parametrizable goals, goal conditioned RL (GCRL) has a broad range of applications, particularly in challenging tasks in robotics. Recent work has established that the optimal value function of GCRL $Q^\ast(s, a, g)$ has a quasipseudometric structure, leading to targetted neural architectures that respect such structure. However, the relevant analyses assume a sparse reward setting—a known aggravating factor to sample complexity. We show that the key property underpinning a quasipseudometric, viz., the triangle inequality, is preserved under a dense reward setting as well, specifically identifying the key condition necessary for triangle inequality. Contrary to earlier findings where dense rewards were shown to be detrimental to GCRL, we conjecture that dense reward functions that satisfy this condition can only improve, never worsen, sample complexity. We evaluate this proposal in 12 standard benchmark environments in GCRL featuring challenging continuous control tasks. Our empirical results confirm that training a quasipseudometric value function in our dense reward setting indeed either improves upon, or preserves, the sample complexity of training with sparse rewards. This opens up opportunities to train efficient neural architectures with dense rewards, compounding their benefits to sample complexity.

Cite

Text

Valieva and Banerjee. "Quasipseudometric Value Functions with Dense Rewards." Transactions on Machine Learning Research, 2025.

Markdown

[Valieva and Banerjee. "Quasipseudometric Value Functions with Dense Rewards." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/valieva2025tmlr-quasipseudometric/)

BibTeX

@article{valieva2025tmlr-quasipseudometric,
  title     = {{Quasipseudometric Value Functions with Dense Rewards}},
  author    = {Valieva, Khadichabonu and Banerjee, Bikramjit},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/valieva2025tmlr-quasipseudometric/}
}