Multimodal LLM-Assisted Evolutionary Search for Programmatic Control Policies

Abstract

Deep reinforcement learning has achieved impressive success in control tasks. However, its policies, represented as opaque neural networks, are often difficult for humans to understand, verify, and debug, which undermines trust and hinders real-world deployment. This work addresses this challenge by introducing a novel approach for programmatic control policy discovery, called **M**ultimodal Large **L**anguage Model-assisted **E**volutionary **S**earch (MLES). MLES utilizes multimodal large language models as programmatic policy generators, combining them with evolutionary search to automate policy generation. It integrates visual feedback-driven behavior analysis within the policy generation process to identify failure patterns and guide targeted improvements, thereby enhancing policy discovery efficiency and producing adaptable, human-aligned policies. Experimental results demonstrate that MLES achieves performance comparable to Proximal Policy Optimization (PPO) across two standard control tasks while providing transparent control logic and traceable design processes. This approach also overcomes the limitations of predefined domain-specific languages, facilitates knowledge transfer and reuse, and is scalable across various tasks, showing promise as a new paradigm for developing transparent and verifiable control policies. Code is publicly available at https://github.com/QingL2000/MLES.

Cite

Text

Hu et al. "Multimodal LLM-Assisted Evolutionary Search for Programmatic Control Policies." International Conference on Learning Representations, 2026.

Markdown

[Hu et al. "Multimodal LLM-Assisted Evolutionary Search for Programmatic Control Policies." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/hu2026iclr-multimodal/)

BibTeX

@inproceedings{hu2026iclr-multimodal,
  title     = {{Multimodal LLM-Assisted Evolutionary Search for Programmatic Control Policies}},
  author    = {Hu, Qinglong and Xialiang, Tong and Yuan, Mingxuan and Liu, Fei and Lu, Zhichao and Zhang, Qingfu},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/hu2026iclr-multimodal/}
}