ControlVLA: Few-Shot Object-Centric Adaptation for Pre-Trained Vision-Language-Action Models

Abstract

Learning real-world robotic manipulation is challenging, particularly when limited demonstrations are available. Existing methods for few-shot manipulation often rely on simulation-augmented data or pre-built modules like grasping and pose estimation, which struggle with sim-to-real gaps and lack extensibility. While large-scale imitation pre-training shows promise, adapting these general-purpose policies to specific tasks in data-scarce settings remains unexplored. To achieve this, we propose ControlVLA, a novel framework that bridges pre-trained VLA models with object-centric representations via a ControlNet-style architecture for efficient fine-tuning. Specifically, to introduce object-centric conditions without overwriting prior knowledge, ControlVLA zero-initializes a set of projection layers, allowing them to gradually adapt the pre-trained manipulation policies. In real-world experiments across 6 diverse tasks, including pouring cubes and folding clothes, our method achieves a 76.7% success rate while requiring only 10-20 demonstrations — a significant improvement over traditional approaches that require more than 100 demonstrations to achieve comparable success. Additional experiments highlight ControlVLA’s extensibility to long-horizon tasks and robustness to unseen objects and backgrounds.

Cite

Text

Li et al. "ControlVLA: Few-Shot Object-Centric Adaptation for Pre-Trained Vision-Language-Action Models." Proceedings of The 9th Conference on Robot Learning, 2025.

Markdown

[Li et al. "ControlVLA: Few-Shot Object-Centric Adaptation for Pre-Trained Vision-Language-Action Models." Proceedings of The 9th Conference on Robot Learning, 2025.](https://mlanthology.org/corl/2025/li2025corl-controlvla/)

BibTeX

@inproceedings{li2025corl-controlvla,
  title     = {{ControlVLA: Few-Shot Object-Centric Adaptation for Pre-Trained Vision-Language-Action Models}},
  author    = {Li, Puhao and Wu, Yingying and Xi, Ziheng and Li, Wanlin and Huang, Yuzhe and Zhang, Zhiyuan and Chen, Yinghan and Wang, Jianan and Zhu, Song-Chun and Liu, Tengyu and Huang, Siyuan},
  booktitle = {Proceedings of The 9th Conference on Robot Learning},
  year      = {2025},
  pages     = {1898-1913},
  volume    = {305},
  url       = {https://mlanthology.org/corl/2025/li2025corl-controlvla/}
}