Xu, Haiyang

27 publications

ICCV 2025 DepR: Depth Guided Single-View Scene Reconstruction with Instance-Level Diffusion Qingcheng Zhao, Xiang Zhang, Haiyang Xu, Zeyuan Chen, Jianwen Xie, Yuan Gao, Zhuowen Tu
ICLR 2025 Endowing Visual Reprogramming with Adversarial Robustness Shengjie Zhou, Xin Cheng, Haiyang Xu, Ming Yan, Tao Xiang, Feng Liu, Lei Feng
ICML 2025 Exploiting Presentative Feature Distributions for Parameter-Efficient Continual Learning of Large Language Models Xin Cheng, Jiabo Ye, Haiyang Xu, Ming Yan, Ji Zhang, Feng Liu, Fei Huang, Lei Feng
NeurIPS 2025 Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation Yuyang Wanyan, Xi Zhang, Haiyang Xu, Haowei Liu, Junyang Wang, Jiabo Ye, Yutong Kou, Ming Yan, Fei Huang, Xiaoshan Yang, Weiming Dong, Changsheng Xu
NeurIPS 2025 OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps Bingnan Li, Chen-Yu Wang, Haiyang Xu, Xiang Zhang, Ethan J. Armand, Divyansh Srivastava, Xiaojun Shan, Zeyuan Chen, Jianwen Xie, Zhuowen Tu
ICLRW 2025 PC-Agent: A Hierarchical Agentic Framework for Complex Task Automation on PC Haowei Liu, Xi Zhang, Haiyang Xu, Yuyang Wanyan, Junyang Wang, Ming Yan, Ji Zhang, Chunfeng Yuan, Changsheng Xu, Weiming Hu, Fei Huang
CVPR 2025 Science-T2I: Addressing Scientific Illusions in Image Synthesis Jialuo Li, Wenhao Chai, Xingyu Fu, Haiyang Xu, Saining Xie
CVPR 2025 SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization Hongrui Jia, Chaoya Jiang, Haiyang Xu, Wei Ye, Mengfan Dong, Ming Yan, Ji Zhang, Fei Huang, Shikun Zhang
ICML 2025 Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning Lang Feng, Weihao Tan, Zhiyi Lyu, Longtao Zheng, Haiyang Xu, Ming Yan, Fei Huang, Bo An
NeurIPS 2025 VLM-R³: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought Chaoya Jiang, Yongrui Heng, Wei Ye, Haiyang Xu, Ming Yan, Ji Zhang, Fei Huang, Shikun Zhang
ICCV 2025 YOLO-Count: Differentiable Object Counting for Text-to-Image Generation Guanning Zeng, Xiang Zhang, Zirui Wang, Haiyang Xu, Zeyuan Chen, Bingnan Li, Zhuowen Tu
ICLR 2025 mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Jiabo Ye, Haiyang Xu, Haowei Liu, Anwen Hu, Ming Yan, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou
CVPR 2024 Bayesian Diffusion Models for 3D Shape Reconstruction Haiyang Xu, Yu Lei, Zeyuan Chen, Xiang Zhang, Yue Zhao, Yilin Wang, Zhuowen Tu
CVPR 2024 Hallucination Augmented Contrastive Learning for Multimodal Large Language Model Chaoya Jiang, Haiyang Xu, Mengfan Dong, Jiaxing Chen, Wei Ye, Ming Yan, Qinghao Ye, Ji Zhang, Fei Huang, Shikun Zhang
NeurIPS 2024 MaVEn: An Effective Multi-Granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model Chaoya Jiang, Hongrui Jia, Haiyang Xu, Wei Ye, Mengfan Dong, Ming Yan, Ji Zhang, Fei Huang, Shikun Zhang
NeurIPS 2024 Mobile-Agent-V2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang
ICLRW 2024 Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception Junyang Wang, Haiyang Xu, Jiabo Ye, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang
CVPRW 2024 OmniControlNet: Dual-Stage Integration for Conditional Image Generation Yilin Wang, Haiyang Xu, Xiang Zhang, Zeyuan Chen, Zhizhou Sha, Zirui Wang, Zhuowen Tu
AAAI 2024 TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-Training Chaoya Jiang, Wei Ye, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Shikun Zhang
CVPR 2024 mPLUG-Owl2: Revolutionizing Multi-Modal Large Language Model with Modality Collaboration Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang
ICCV 2023 BUS: Efficient and Effective Vision-Language Pre-Training with Bottom-up Patch Summarization. Chaoya Jiang, Haiyang Xu, Wei Ye, Qinghao Ye, Chenliang Li, Ming Yan, Bin Bi, Shikun Zhang, Fei Huang, Songfang Huang
IJCAI 2023 Curriculum Multi-Level Learning for Imbalanced Live-Stream Recommendation Shuodian Yu, Junqi Jin, Li Ma, Xiaofeng Gao, Xiaopeng Wu, Haiyang Xu, Jian Xu
ICCV 2023 HiTeA: Hierarchical Temporal-Aware Video-Language Pre-Training Qinghao Ye, Guohai Xu, Ming Yan, Haiyang Xu, Qi Qian, Ji Zhang, Fei Huang
ICCV 2023 Learning Trajectory-Word Alignments for Video-Language Tasks Xu Yang, Zhangzikang Li, Haiyang Xu, Hanwang Zhang, Qinghao Ye, Chenliang Li, Ming Yan, Yu Zhang, Fei Huang, Songfang Huang
ICML 2023 mPLUG-2: A Modularized Multi-Modal Foundation Model Across Text, Image and Video Haiyang Xu, Qinghao Ye, Ming Yan, Yaya Shi, Jiabo Ye, Yuanhong Xu, Chenliang Li, Bin Bi, Qi Qian, Wei Wang, Guohai Xu, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou
CVPR 2022 EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching Yaya Shi, Xu Yang, Haiyang Xu, Chunfeng Yuan, Bing Li, Weiming Hu, Zheng-Jun Zha
IJCAI 2016 Unsupervised Storyline Extraction from News Articles Deyu Zhou, Haiyang Xu, Xin-Yu Dai, Yulan He