Chunking the Critic: A Transformer-Based Soft Actor-Critic with N-Step Returns

Abstract

We introduce a sequence-conditioned critic for Soft Actor--Critic (SAC) that models trajectory context with a lightweight Transformer and trains on aggregated $N$-step targets. Unlike prior approaches that (i) score state--action pairs in isolation or (ii) rely on actor-side action chunking to handle long horizons, our method strengthens the critic itself by conditioning on short trajectory segments and integrating multi-step returns without the need of importance sampling (IS). The resulting sequence-aware value estimates capture the critical temporal structure for extended-horizon and sparse-reward problems. On multiple benchmarks, we further show that freezing critic parameters for several steps makes our update compatible with CrossQ's core idea, enabling stable training without a target network. Despite its simplicity, a 2-layer Transformer with $128$--$256$ hidden units and a maximum update-to-data ratio (UTD) of $1$, the approach consistently outperforms standard SAC and strong off-policy baselines, with particularly large gains on long-trajectory control. These results highlight the value of sequence modeling and $N$-step bootstrapping on the critic side for long-horizon reinforcement learning.

Cite

Text

Tian et al. "Chunking the Critic: A Transformer-Based Soft Actor-Critic with N-Step Returns." International Conference on Learning Representations, 2026.

Markdown

[Tian et al. "Chunking the Critic: A Transformer-Based Soft Actor-Critic with N-Step Returns." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/tian2026iclr-chunking/)

BibTeX

@inproceedings{tian2026iclr-chunking,
  title     = {{Chunking the Critic: A Transformer-Based Soft Actor-Critic with N-Step Returns}},
  author    = {Tian, Dong and Celik, Onur and Neumann, Gerhard},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/tian2026iclr-chunking/}
}