Floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL

Agrawalla, Bhavya Kumar; Nauman, Michal; Agrawal, Khush; Kumar, Aviral

Floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL

Bhavya Kumar Agrawalla, Michal Nauman, Khush Agrawal, Aviral Kumar

ICLR 2026

/iclr/2026/agrawalla2026iclr-floq/

Abstract

A hallmark of modern large-scale machine learning techniques is the use of training objectives that provide dense supervision to intermediate computations, such as teacher forcing the next token in language models or denoising step-by-step in diffusion models. This enables models to learn complex functions in a generalizable manner. Motivated by this observation, we investigate the benefits of iterative computation for temporal difference (TD) methods in reinforcement learning (RL). Typically, they represent value functions in a monolithic fashion, without iterative compute. We introduce floq (flow-matching Q-functions), an approach that parameterizes the Q-function using a velocity field and trains it with techniques from flow-matching, typically used in generative modeling. This velocity field underneath the flow is trained using a TD-learning objective, which bootstraps from values produced by a target velocity field, computed by running multiple steps of numerical integration. Crucially, floq allows for more fine-grained control and scaling of the Q-function capacity than monolithic architectures, by appropriately setting the number of integration steps. Across a suite of challenging offline RL benchmarks and online fine-tuning tasks, floq improves performance by nearly 1.8x. floq scales capacity far better than standard TD-learning architectures, highlighting the potential of iterative computation for value learning.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Agrawalla et al. "Floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL." International Conference on Learning Representations, 2026.

Markdown

[Agrawalla et al. "Floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/agrawalla2026iclr-floq/)

BibTeX

@inproceedings{agrawalla2026iclr-floq,
  title     = {{Floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL}},
  author    = {Agrawalla, Bhavya Kumar and Nauman, Michal and Agrawal, Khush and Kumar, Aviral},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/agrawalla2026iclr-floq/}
}