Code-as-Monitor: Constraint-Aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Abstract

Automatic detection and prevention of open-set failures are crucial in closed-loop robotic systems. Recent studies often struggle to simultaneously identify unexpected failures reactively after they occur and prevent foreseeable ones proactively. To this end, we propose Code-as-Monitor (CaM), a novel paradigm leveraging the vision-language model (VLM) for both open-set reactive and proactive failure detection. The core of our method is to formulate both tasks as a unified set of spatio-temporal constraint satisfaction problems and use VLM-generated code to evaluate them for real-time monitoring. To enhance the accuracy and efficiency of monitoring, we further introduce constraint elements that abstract constraint-related entities or their parts into compact geometric elements. This approach offers greater generality, simplifies tracking, and facilitates constraint-aware visual programming by leveraging these elements as visual prompts. Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances compared to baselines across three simulators and a real-world setting. Moreover, CaM can be integrated with open-loop control policies to form closed-loop systems, enabling long-horizon tasks in cluttered scenes with dynamic environments.

Cite

Text

Zhou et al. "Code-as-Monitor: Constraint-Aware Visual Programming for Reactive and Proactive Robotic Failure Detection." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00649

Markdown

[Zhou et al. "Code-as-Monitor: Constraint-Aware Visual Programming for Reactive and Proactive Robotic Failure Detection." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/zhou2025cvpr-codeasmonitor/) doi:10.1109/CVPR52734.2025.00649

BibTeX

@inproceedings{zhou2025cvpr-codeasmonitor,
  title     = {{Code-as-Monitor: Constraint-Aware Visual Programming for Reactive and Proactive Robotic Failure Detection}},
  author    = {Zhou, Enshen and Su, Qi and Chi, Cheng and Zhang, Zhizheng and Wang, Zhongyuan and Huang, Tiejun and Sheng, Lu and Wang, He},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {6919-6929},
  doi       = {10.1109/CVPR52734.2025.00649},
  url       = {https://mlanthology.org/cvpr/2025/zhou2025cvpr-codeasmonitor/}
}