Towards Bridging the Gap Between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control

Abstract

Reinforcement learning (RL) is widely used for humanoid control, with on-policy methods such as Proximal Policy Optimization (PPO) enabling robust training via large-scale parallel simulation and, in some cases, zero-shot deployment to real robots. However, the low sample efficiency of on-policy algorithms limits safe adaptation to new environments. Although off-policy RL and model-based RL have shown improved sample efficiency, the gap between large-scale pretraining and efficient finetuning on humanoids still exists. In this paper, we find that off-policy Soft Actor-Critic (SAC), with large-batch update and a high Update-To-Data (UTD) ratio, reliably supports large-scale pretraining of humanoid locomotion policies, achieving zero-shot deployment on real robots. For adaptation, we demonstrate that these SAC-pretrained policies can be finetuned in new environments and out-of-distribution tasks using model-based methods. Data collection in the new environment executes a deterministic policy while stochastic exploration is instead confined to a physics-informed world model. This separation mitigates the risks of random exploration during adaptation while preserving exploratory coverage for improvement. Overall, the approach couples the wall-clock efficiency of large-scale simulation during pretraining with the sample efficiency of model-based learning during fine-tuning. Code and videos: https://lift-humanoid.github.io

Cite

Text

Huang et al. "Towards Bridging the Gap Between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control." International Conference on Learning Representations, 2026.

Markdown

[Huang et al. "Towards Bridging the Gap Between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/huang2026iclr-bridging/)

BibTeX

@inproceedings{huang2026iclr-bridging,
  title     = {{Towards Bridging the Gap Between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control}},
  author    = {Huang, Weidong and Li, Zhehan and Liu, Hangxin and Hou, Biao and Su, Yao and Zhang, Jingwen},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/huang2026iclr-bridging/}
}