FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Flow Models

Abstract

Developing efficient Vision-Language-Action (VLA) policies is crucial for practical robotics deployment, yet current approaches face prohibitive computational costs and resource requirements. Existing diffusion-based VLA policies require multi-billion-parameter models and massive datasets to achieve strong performance. We tackle this efficiency challenge with two contributions: intermediate-modality fusion, which reallocates capacity to the diffusion head by pruning up to 50% of LLM layers, and action-specific Global-AdaLN conditioning, which cuts parameters by 20% through modular adaptation. We integrate these advances into a novel 950 M-parameter VLA called FLOWER. Pretrained in just 200 H100 GPU hours, FLOWER delivers a 25.9% improvement over state-of-the-art baselines across 190 tasks spanning ten simulation and real-world benchmarks and demonstrates robustness across diverse robotic embodiments. All code, pretrained weights, and training recipes are publicly released to democratize efficient VLA development.

Cite

Text

Reuss et al. "FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Flow Models." Proceedings of The 9th Conference on Robot Learning, 2025.

Markdown

[Reuss et al. "FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Flow Models." Proceedings of The 9th Conference on Robot Learning, 2025.](https://mlanthology.org/corl/2025/reuss2025corl-flower/)

BibTeX

@inproceedings{reuss2025corl-flower,
  title     = {{FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Flow Models}},
  author    = {Reuss, Moritz and Zhou, Hongyi and Rühle, Marcel and Yağmurlu, Ömer Erdinç and Otto, Fabian and Lioutikov, Rudolf},
  booktitle = {Proceedings of The 9th Conference on Robot Learning},
  year      = {2025},
  pages     = {3736-3761},
  volume    = {305},
  url       = {https://mlanthology.org/corl/2025/reuss2025corl-flower/}
}