Xu, Haiyang
27 publications
NeurIPS
2025
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation
ICML
2025
Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning
NeurIPS
2025
VLM-R³: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
ICLR
2025
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
NeurIPS
2024
MaVEn: An Effective Multi-Granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model
NeurIPS
2024
Mobile-Agent-V2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
ICCV
2023
BUS: Efficient and Effective Vision-Language Pre-Training with Bottom-up Patch Summarization.