Gao, Peng

74 publications

AAAI 2025 A Multi-Focus-Driven Multi-Branch Network for Robust Multimodal Sentiment Analysis Chuanqi Tao, Jiaming Li, Tianzi Zang, Peng Gao
CVPR 2025 Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding Han Xiao, Yina Xie, Guanxin Tan, Yinghao Chen, Rui Hu, Ke Wang, Aojun Zhou, Hao Li, Hao Shao, Xudong Lu, Peng Gao, Yafei Wen, Xiaoxin Chen, Shuai Ren, Hongsheng Li
NeurIPS 2025 CAML: Collaborative Auxiliary Modality Learning for Multi-Agent Systems Rui Liu, Yu Shen, Peng Gao, Pratap Tokekar, Ming Lin
ICLR 2025 Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want Weifeng Lin, Xinyu Wei, Ruichuan An, Peng Gao, Bocheng Zou, Yulin Luo, Siyuan Huang, Shanghang Zhang, Hongsheng Li
NeurIPS 2025 EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation Siyuan Huang, Liliang Chen, Pengfei Zhou, Shengcong Chen, Yue Liao, Zhengkai Jiang, Yue Hu, Peng Gao, Hongsheng Li, Maoqing Yao, Guanghui Ren
ICCV 2025 FontAnimate: High Quality Few-Shot Font Generation via Animating Font Transfer Process Bin Fu, Zixuan Wang, Kainan Yan, Shitian Zhao, Qi Qin, Jie Wen, Junjun He, Peng Gao
ICCV 2025 From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning Le Zhuo, Liangbing Zhao, Sayak Paul, Yue Liao, Renrui Zhang, Yi Xin, Peng Gao, Mohamed Elhoseiny, Hongsheng Li
ICCV 2025 How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation? Yujian Lee, Peng Gao, Yongqi Xu, Wentao Fan
CVPR 2025 Let's Verify and Reinforce Image Generation Step by Step Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao, Ziyu Guo, Haoquan Zhang, Manyuan Zhang, Jiaming Liu, Peng Gao, Hongsheng Li
AAAI 2025 LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding Senqiao Yang, Jiaming Liu, Renrui Zhang, Mingjie Pan, Ziyu Guo, Xiaoqi Li, Zehui Chen, Peng Gao, Hongsheng Li, Yandong Guo, Shanghang Zhang
ICCV 2025 Lumina-Image 2.0: A Unified and Efficient Image Generative Framework Qi Qin, Le Zhuo, Yi Xin, Ruoyi Du, Zhen Li, Bin Fu, Yiting Lu, Xinyue Li, Dongyang Liu, Xiangyang Zhu, Will Beddow, Erwann Millon, Victor Perez, Wenhai Wang, Yu Qiao, Bo Zhang, Xiaohong Liu, Hongsheng Li, Chang Xu, Peng Gao
ICLR 2025 Lumina-T2X: Scalable Flow-Based Large Diffusion Transformer for Flexible Resolution Generation Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xie, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, Tong He, Jingwen He, Junjun He, Yu Qiao, Hongsheng Li
ICLR 2025 MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine Renrui Zhang, Xinyu Wei, Dongzhi Jiang, Ziyu Guo, Yichi Zhang, Chengzhuo Tong, Jiaming Liu, Aojun Zhou, Shanghang Zhang, Peng Gao, Hongsheng Li
ICML 2025 MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Dongzhi Jiang, Renrui Zhang, Ziyu Guo, Yanwei Li, Yu Qi, Xinyan Chen, Liuhui Wang, Jianhan Jin, Claire Guo, Shen Yan, Bo Zhang, Chaoyou Fu, Peng Gao, Hongsheng Li
ICLR 2025 MMSearch: Unveiling the Potential of Large Models as Multi-Modal Search Engines Dongzhi Jiang, Renrui Zhang, Ziyu Guo, Yanmin Wu, Jiayi Lei, Pengshuo Qiu, Pan Lu, Zehui Chen, Guanglu Song, Peng Gao, Yu Liu, Chunyuan Li, Hongsheng Li
ICLR 2025 PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Weifeng Lin, Xinyu Wei, Renrui Zhang, Le Zhuo, Shitian Zhao, Siyuan Huang, Junlin Xie, Peng Gao, Hongsheng Li
ICCV 2025 Spatial Preference Rewarding for MLLMs Spatial Understanding Han Qiu, Peng Gao, Lewei Lu, Xiaoqin Zhang, Ling Shao, Shijian Lu
CoRL 2025 Subteaming and Adaptive Formation Control for Coordinated Multi-Robot Navigation Zihao Deng, Peng Gao, Williard Joshua Jose, Maggie Wigness, John G. Rogers Iii, Brian Reily, Christopher M. Reardon, Hao Zhang
ICCV 2025 TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction Xuying Zhang, Yutong Liu, Yangguang Li, Renrui Zhang, Yufei Liu, Kai Wang, Wanli Ouyang, Zhiwei Xiong, Peng Gao, Qibin Hou, Ming-Ming Cheng
ICCV 2025 VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Zhong-Yu Li, Ruoyi Du, Juncheng Yan, Le Zhuo, Zhen Li, Peng Gao, Zhanyu Ma, Ming-Ming Cheng
CoRL 2024 A3VLM: Actionable Articulation-Aware Vision Language Model Siyuan Huang, Haonan Chang, Yuhan Liu, Yimeng Zhu, Hao Dong, Abdeslam Boularias, Peng Gao, Hongsheng Li
ECCV 2024 Any2Point: Empowering Any-Modality Transformers for Efficient 3D Understanding Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Bin Zhao, Zhigang Wang, Dong Wang, Peng Gao, Hongsheng Li, Xuelong Li
ICLR 2024 BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo
CVPR 2024 Digital Life Project: Autonomous 3D Characters with Social Intelligence Zhongang Cai, Jianping Jiang, Zhongfei Qing, Xinying Guo, Mingyuan Zhang, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, Liang Pan, Xiangyu Fan, Han Du, Peng Gao, Zhitao Yang, Yang Gao, Jiaqi Li, Tianxiang Ren, Yukun Wei, Xiaogang Wang, Chen Change Loy, Lei Yang, Ziwei Liu
WACV 2024 Efficient MAE Towards Large-Scale Vision Transformers Qiu Han, Gongjie Zhang, Jiaxing Huang, Peng Gao, Zhang Wei, Shijian Lu
ICML 2024 FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion Zehan Wang, Ziang Zhang, Xize Cheng, Rongjie Huang, Luping Liu, Zhenhui Ye, Haifeng Huang, Yang Zhao, Tao Jin, Peng Gao, Zhou Zhao
ICML 2024 InstructSpeech: Following Speech Editing Instructions via Large Language Models Rongjie Huang, Ruofan Hu, Yongqi Wang, Zehan Wang, Xize Cheng, Ziyue Jiang, Zhenhui Ye, Dongchao Yang, Luping Liu, Peng Gao, Zhou Zhao
ICLR 2024 Llama-Adapter: Efficient Fine-Tuning of Large Language Models with Zero-Initialized Attention Renrui Zhang, Jiaming Han, Chris Liu, Aojun Zhou, Pan Lu, Yu Qiao, Hongsheng Li, Peng Gao
NeurIPS 2024 Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Xiangyang Zhu, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Lirui Zhao, Si Liu, Xiangyu Yue, Wanli Ouyang, Yu Qiao, Hongsheng Li, Peng Gao
ICML 2024 MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao
CVPR 2024 Masked AutoDecoder Is Effective Multi-Task Vision Generalist Han Qiu, Jiaxing Huang, Peng Gao, Lewei Lu, Xiaoqin Zhang, Shijian Lu
ECCV 2024 MathVerse: Does Your Multi-Modal LLM Truly See the Diagrams in Visual Math Problems? Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Peng Gao, Hongsheng Li
CVPR 2024 No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao
ICLR 2024 OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, Ping Luo
CVPR 2024 OneLLM: One Framework to Align All Modalities with Language Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue
ICLR 2024 Personalize Segment Anything Model with One Shot Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li
NeurIPS 2024 Phased Consistency Models Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, Xiaogang Wang, Hongsheng Li
AAAI 2024 Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation Shilin Yan, Renrui Zhang, Ziyu Guo, Wenchao Chen, Wei Zhang, Hongyang Li, Yu Qiao, Hao Dong, Zhongjiang He, Peng Gao
ICML 2024 SPHINX-X: Scaling Data and Parameters for a Family of Multi-Modal Large Language Models Dongyang Liu, Renrui Zhang, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Yu Qiao, Hongsheng Li, Peng Gao
ECCV 2024 SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-Modal Large Language Models Ziyi Lin, Dongyang Liu, Renrui Zhang, Peng Gao, Longtian Qiu, Han Xiao, Han Qiu, Wenqi Shao, Keqin Chen, Jiaming Han, Siyuan Huang, Yichi Zhang, Xuming He, Yu Qiao, Hongsheng Li
ICML 2024 SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models Xudong Lu, Aojun Zhou, Yuhui Xu, Renrui Zhang, Peng Gao, Hongsheng Li
ECCV 2024 SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding Han Xiao, Wenzhao Zheng, Sicheng Zuo, Peng Gao, Jie Zhou, Jiwen Lu
ICML 2023 Auxiliary Modality Learning with Generalized Curriculum Distillation Yu Shen, Xijun Wang, Peng Gao, Ming Lin
CVPR 2023 Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders Renrui Zhang, Liuhui Wang, Yu Qiao, Peng Gao, Hongsheng Li
ICCV 2023 MonoDETR: Depth-Guided Transformer for Monocular 3D Object Detection Renrui Zhang, Han Qiu, Tai Wang, Ziyu Guo, Ziteng Cui, Yu Qiao, Hongsheng Li, Peng Gao
ICCV 2023 Not All Features Matter: Enhancing Few-Shot CLIP with Adaptive Prior Refinement Xiangyang Zhu, Renrui Zhang, Bowei He, Aojun Zhou, Dong Wang, Bin Zhao, Peng Gao
ICCV 2023 PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-World Learning Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Ziyao Zeng, Zipeng Qin, Shanghang Zhang, Peng Gao
CVPR 2023 Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners Renrui Zhang, Xiangfei Hu, Bohao Li, Siyuan Huang, Hanqiu Deng, Yu Qiao, Peng Gao, Hongsheng Li
CVPR 2023 Q-DETR: An Efficient Low-Bit Quantized Detection Transformer Sheng Xu, Yanjing Li, Mingbao Lin, Peng Gao, Guodong Guo, Jinhu Lü, Baochang Zhang
AAAI 2023 Resilient Binary Neural Network Sheng Xu, Yanjing Li, Teli Ma, Mingbao Lin, Hao Dong, Baochang Zhang, Peng Gao, Jinhu Lu
ICCV 2023 SparseMAE: Sparse Training Meets Masked Autoencoders Aojun Zhou, Yang Li, Zipeng Qin, Jianbo Liu, Junting Pan, Renrui Zhang, Rui Zhao, Peng Gao, Hongsheng Li
CVPR 2023 Stare at What You See: Masked Image Modeling Without Reconstruction Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo
CVPR 2023 Starting from Non-Parametric Networks for 3D Point Cloud Analysis Renrui Zhang, Liuhui Wang, Yali Wang, Peng Gao, Hongsheng Li, Jianbo Shi
ECCV 2022 Frozen CLIP Models Are Efficient Video Learners Ziyi Lin, Shijie Geng, Renrui Zhang, Peng Gao, Gerard de Melo, Xiaogang Wang, Jifeng Dai, Yu Qiao, Hongsheng Li
ECCV 2022 IDa-Det: An Information Discrepancy-Aware Distillation for 1-Bit Detectors Sheng Xu, Yanjing Li, Bohan Zeng, Teli Ma, Baochang Zhang, Xianbin Cao, Peng Gao, Jinhu Lü
NeurIPS 2022 MCMAE: Masked Convolution Meets Masked Autoencoders Peng Gao, Teli Ma, Hongsheng Li, Ziyi Lin, Jifeng Dai, Yu Qiao
NeurIPS 2022 Point-M2AE: Multi-Scale Masked Autoencoders for Hierarchical Point Cloud Pre-Training Renrui Zhang, Ziyu Guo, Peng Gao, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, Hongsheng Li
CVPR 2022 PointCLIP: Point Cloud Understanding by CLIP Renrui Zhang, Ziyu Guo, Wei Zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, Hongsheng Li
ECCV 2022 Prototypical Contrast Adaptation for Domain Adaptive Semantic Segmentation Zhengkai Jiang, Yuxi Li, Ceyuan Yang, Peng Gao, Yabiao Wang, Ying Tai, Chengjie Wang
NeurIPS 2022 Q-ViT: Accurate and Fully Quantized Low-Bit Vision Transformer Yanjing Li, Sheng Xu, Baochang Zhang, Xianbin Cao, Peng Gao, Guodong Guo
ECCV 2022 Recurrent Bilinear Optimization for Binary Neural Networks Sheng Xu, Yanjing Li, Tiancheng Wang, Teli Ma, Baochang Zhang, Peng Gao, Yu Qiao, Jinhu Lü, Guodong Guo
ECCV 2022 Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification Renrui Zhang, Wei Zhang, Rongyao Fang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li
NeurIPS 2021 Container: Context Aggregation Networks Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi
NeurIPS 2021 Dual-Stream Network for Visual Recognition Mingyuan Mao, Peng Gao, Renrui Zhang, Honghui Zheng, Teli Ma, Yan Peng, Errui Ding, Baochang Zhang, Shumin Han
AAAI 2021 Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers Shijie Geng, Peng Gao, Moitreya Chatterjee, Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian
ICCV 2021 Fast Convergence of DETR with Spatially Modulated Co-Attention Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li
IJCAI 2021 Pairwise Half-Graph Discrimination: A Simple Graph-Level Self-Supervised Strategy for Pre-Training Graph Neural Networks Pengyong Li, Jun Wang, Ziliang Li, Yixuan Qiao, Xianggen Liu, Fei Ma, Peng Gao, Sen Song, Guotong Xie
ECCV 2020 Learning Where to Focus for Efficient Video Object Detection Zhengkai Jiang, Yu Liu, Ceyuan Yang, Jihao Liu, Peng Gao, Qian Zhang, Shiming Xiang, Chunhong Pan
AAAI 2020 Long-Term Loop Closure Detection Through Visual-Spatial Information Preserving Multi-Order Graph Matching Peng Gao, Hao Zhang
AAAI 2020 Region Focus Network for Joint Optic Disc and Cup Segmentation Ge Li, Changsheng Li, Chan Zeng, Peng Gao, Guotong Xie
AAAI 2019 Video Object Detection with Locally-Weighted Deformable Neighbors Zhengkai Jiang, Peng Gao, Chaoxu Guo, Qian Zhang, Shiming Xiang, Chunhong Pan
IJCAI 2018 Dynamic Bayesian Logistic Matrix Factorization for Recommendation with Implicit Feedback Yong Liu, Lifan Zhao, Guimei Liu, Xinyan Lu, Peng Gao, Xiao-Li Li, Zhihui Jin
ECCV 2018 Question-Guided Hybrid Convolution for Visual Question Answering Peng Gao, Hongsheng Li, Shuang Li, Pan Lu, Yikang Li, Steven C.H. Hoi, Xiaogang Wang
AAAI 2017 SCOPE: Scalable Composite Optimization for Learning on Spark Shen-Yi Zhao, Ru Xiang, Ying-Hao Shi, Peng Gao, Wu-Jun Li