Gan, Zhe

85 publications

ICML 2025 Contrastive Localized Language-Image Pre-Training Hong-You Chen, Zhengfeng Lai, Haotian Zhang, Xinze Wang, Marcin Eichner, Keen You, Meng Cao, Bowen Zhang, Yinfei Yang, Zhe Gan
ICLR 2025 Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms Zhangheng Li, Keen You, Haotian Zhang, Di Feng, Harsh Agrawal, Xiujun Li, Mohana Prasad Sathya Moorthy, Jeffrey Nichols, Yinfei Yang, Zhe Gan
CVPR 2025 From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons Andrew Szot, Bogdan Mazoure, Omar Attia, Aleksei Timofeev, Harsh Agrawal, Devon Hjelm, Zhe Gan, Zsolt Kira, Alexander Toshev
ICLR 2025 MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier, Peter Grasch, Yinfei Yang, Zhe Gan
ICLR 2025 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-Tuning Haotian Zhang, Mingfei Gao, Zhe Gan, Philipp Dufter, Nina Wenzel, Forrest Huang, Dhruti Shah, Xianzhi Du, Bowen Zhang, Yanghao Li, Sam Dodge, Keen You, Zhen Yang, Aleksei Timofeev, Mingze Xu, Hong-You Chen, Jean-Philippe Fauconnier, Zhengfeng Lai, Haoxuan You, Zirui Wang, Afshin Dehghan, Peter Grasch, Yinfei Yang
ICLR 2025 MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA Hanrong Ye, Haotian Zhang, Erik Daxberger, Lin Chen, Zongyu Lin, Yanghao Li, Bowen Zhang, Haoxuan You, Dan Xu, Zhe Gan, Jiasen Lu, Yinfei Yang
CVPR 2025 Multimodal Autoregressive Pre-Training of Large Vision Encoders Enrico Fini, Mustafa Shukor, Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju, Victor G. Turrisi da Costa, Louis Béthune, Zhe Gan, Alexander Toshev, Marcin Eichner, Moin Nabi, Yinfei Yang, Joshua Susskind, Alaaeldin El-Nouby
ICLR 2025 Revisit Large-Scale Image-Caption Data in Pre-Training Multimodal Foundation Models Zhengfeng Lai, Vasileios Saveris, Chen Chen, Hong-You Chen, Haotian Zhang, Bowen Zhang, Wenze Hu, Juan Lao Tebar, Zhe Gan, Peter Grasch, Meng Cao, Yinfei Yang
ICCV 2025 UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing Tsu-Jui Fu, Yusu Qian, Chen Chen, Wenze Hu, Zhe Gan, Yinfei Yang
ICLR 2024 Compressing LLMs: The Truth Is Rarely Pure and Never Simple Ajay Kumar Jaiswal, Zhe Gan, Xianzhi Du, Bowen Zhang, Zhangyang Wang, Yinfei Yang
CVPRW 2024 Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation Jaemin Cho, Linjie Li, Zhengyuan Yang, Zhe Gan, Lijuan Wang, Mohit Bansal
ECCV 2024 Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeff Nichols, Yinfei Yang, Zhe Gan
ICLR 2024 Ferret: Refer and Ground Anything Anywhere at Any Granularity Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang
ECCV 2024 GRiT: A Generative Region-to-Text Transformer for Object Understanding Jialian Wu, Jianfeng Wang, Zhengyuan Yang, Zhe Gan, Zicheng Liu, Junsong Yuan, Lijuan Wang
ICLR 2024 Guiding Instruction-Based Image Editing via Multimodal Large Language Models Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan
NeurIPSW 2024 How Easy Is It to Fool Your Multimodal LLMs? an Empirical Analysis on Deceptive Prompt Yusu Qian, Haotian Zhang, Yinfei Yang, Zhe Gan
ECCV 2024 MM1: Methods, Analysis & Insights from Multimodal LLM Pre-Training Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Samuel Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Futang Peng, Anton Belyi, Max A Schwarzer, Hongyu Hè, Xianzhi Du, Haotian Zhang, Karanjeet Singh, Doug Kang, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang
ECCV 2024 VeCLIP: Improving CLIP Training via Visual-Enriched Captions Zhengfeng Lai, Haotian Zhang, Bowen Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang, Meng Cao
CVPR 2023 An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu
CVPR 2023 Generalized Decoding for Pixel, Image, and Language Xueyan Zou, Zi-Yi Dou, Jianwei Yang, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, Jianfeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee, Jianfeng Gao
CVPR 2023 LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang
CVPR 2023 Non-Contrastive Learning Meets Language-Image Pre-Training Jinghao Zhou, Li Dong, Zhe Gan, Lijuan Wang, Furu Wei
NeurIPSW 2023 Pre-Trained Language Models Do Not Help Auto-Regressive Text-to-Image Generation Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar, Alexander Toshev
ICLR 2023 Prompting GPT-3 to Be Reliable Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jordan Lee Boyd-Graber, Lijuan Wang
CVPR 2023 ReCo: Region-Controlled Text-to-Image Generation Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang
TMLR 2022 Adversarial Feature Augmentation and Normalization for Visual Recognition Tianlong Chen, Yu Cheng, Zhe Gan, Jianfeng Wang, Lijuan Wang, Jingjing Liu, Zhangyang Wang
AAAI 2022 An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang
CVPR 2022 An Empirical Study of Training End-to-End Vision-and-Language Transformers Zi-Yi Dou, Yichong Xu, Zhe Gan, Jianfeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, Michael Zeng
NeurIPS 2022 Coarse-to-Fine Vision-Language Pre-Training with Fusion in the Backbone Zi-Yi Dou, Aishwarya Kamath, Zhe Gan, Pengchuan Zhang, Jianfeng Wang, Linjie Li, Zicheng Liu, Ce Liu, Yann LeCun, Nanyun Peng, Jianfeng Gao, Lijuan Wang
AAAI 2022 Efficient Robust Training via Backward Smoothing Jinghui Chen, Yu Cheng, Zhe Gan, Quanquan Gu, Jingjing Liu
TMLR 2022 GIT: A Generative Image-to-Text Transformer for Vision and Language Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang
CVPR 2022 Injecting Semantic Concepts into End-to-End Image Captioning Zhiyuan Fang, Jianfeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu
NeurIPS 2022 K-LITE: Learning Transferable Visual Models with External Knowledge Sheng Shen, Chunyuan Li, Xiaowei Hu, Yujia Xie, Jianwei Yang, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao
NeurIPS 2022 NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis Jian Liang, Chenfei Wu, Xiaowei Hu, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan
AAAI 2022 Playing Lottery Tickets with Vision and Language Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu, Lijuan Wang, Zicheng Liu
CVPR 2022 Scaling up Vision-Language Pre-Training for Image Captioning Xiaowei Hu, Zhe Gan, Jianfeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang
CVPR 2022 SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang
ECCV 2022 UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang
ICCV 2021 Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models Linjie Li, Jie Lei, Zhe Gan, Jingjing Liu
NeurIPS 2021 Chasing Sparsity in Vision Transformers: An End-to-End Exploration Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang
NeurIPS 2021 Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective Tianlong Chen, Yu Cheng, Zhe Gan, Jingjing Liu, Zhangyang Wang
AAAI 2021 FILTER: An Enhanced Fusion Method for Cross-Lingual Language Understanding Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu
ICLR 2021 Improving Zero-Shot Voice Style Transfer via Disentangled Representation Learning Siyang Yuan, Pengyu Cheng, Ruiyi Zhang, Weituo Hao, Zhe Gan, Lawrence Carin
ICLR 2021 InfoBERT: Improving Robustness of Language Models from an Information Theoretic Perspective Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu
CVPR 2021 Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling Jie Lei, Linjie Li, Luowei Zhou, Zhe Gan, Tamara L. Berg, Mohit Bansal, Jingjing Liu
ECML-PKDD 2021 MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients Chen Zhu, Yu Cheng, Zhe Gan, Furong Huang, Jingjing Liu, Tom Goldstein
WACV 2021 Meta Module Network for Compositional Visual Reasoning Wenhu Chen, Zhe Gan, Linjie Li, Yu Cheng, William Wang, Jingjing Liu
NeurIPS 2021 The Elastic Lottery Ticket Hypothesis Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Jingjing Liu, Zhangyang Wang
CVPR 2021 Wasserstein Contrastive Representation Distillation Liqun Chen, Dong Wang, Zhe Gan, Jingjing Liu, Ricardo Henao, Lawrence Carin
ECCV 2020 Behind the Scene: Revealing the Secrets of Pre-Trained Vision-and-Language Models Jize Cao, Zhe Gan, Yu Cheng, Licheng Yu, Yen-Chun Chen, Jingjing Liu
ICML 2020 CLUB: A Contrastive Log-Ratio Upper Bound of Mutual Information Pengyu Cheng, Weituo Hao, Shuyang Dai, Jiachang Liu, Zhe Gan, Lawrence Carin
ICLR 2020 FreeLB: Enhanced Adversarial Training for Natural Language Understanding Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Tom Goldstein, Jingjing Liu
ICML 2020 Graph Optimal Transport for Cross-Domain Alignment Liqun Chen, Zhe Gan, Yu Cheng, Linjie Li, Lawrence Carin, Jingjing Liu
AAAI 2020 Graph-Driven Generative Models for Heterogeneous Multi-Task Learning Wenlin Wang, Hongteng Xu, Zhe Gan, Bai Li, Guoyin Wang, Liqun Chen, Qian Yang, Wenqi Wang, Lawrence Carin
NeurIPS 2020 Large-Scale Adversarial Training for Vision-and-Language Representation Learning Zhe Gan, Yen-Chun Chen, Linjie Li, Chen Zhu, Yu Cheng, Jingjing Liu
AISTATS 2020 Nested-Wasserstein Self-Imitation Learning for Sequence Generation Ruiyi Zhang, Changyou Chen, Zhe Gan, Zheng Wen, Wenlin Wang, Lawrence Carin
ECCV 2020 UNITER: UNiversal Image-TExt Representation Learning Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu
AAAI 2020 What Makes a Good Story? Designing Composite Rewards for Visual Storytelling Junjie Hu, Yu Cheng, Zhe Gan, Jingjing Liu, Jianfeng Gao, Graham Neubig
AAAI 2019 Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation Qiuyuan Huang, Zhe Gan, Asli Celikyilmaz, Dapeng Oliver Wu, Jianfeng Wang, Xiaodong He
ICLR 2019 Improving Sequence-to-Sequence Learning via Optimal Transport Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao Zhang, Bai Li, Dinghan Shen, Changyou Chen, Lawrence Carin
NeurIPS 2019 Improving Textual Network Learning with Variational Homophilic Embeddings Wenlin Wang, Chenyang Tao, Zhe Gan, Guoyin Wang, Liqun Chen, Xinyuan Zhang, Ruiyi Zhang, Qian Yang, Ricardo Henao, Lawrence Carin
AAAI 2018 Adaptive Feature Abstraction for Translating Video to Text Yunchen Pu, Martin Renqiang Min, Zhe Gan, Lawrence Carin
NeurIPS 2018 Adversarial Text Generation via Feature-Mover's Distance Liqun Chen, Shuyang Dai, Chenyang Tao, Haichao Zhang, Zhe Gan, Dinghan Shen, Yizhe Zhang, Guoyin Wang, Ruiyi Zhang, Lawrence Carin
NeurIPS 2018 Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization Yizhe Zhang, Michel Galley, Jianfeng Gao, Zhe Gan, Xiujun Li, Chris Brockett, Bill Dolan
ICML 2018 JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets Yunchen Pu, Shuyang Dai, Zhe Gan, Weiyao Wang, Guoyin Wang, Yizhe Zhang, Ricardo Henao, Lawrence Carin Duke
MLHC 2018 Multi-Label Learning from Medical Plain Text with Convolutional Residual Models Yinyuan Zhang, Ricardo Henao, Zhe Gan, Yitong Li, Lawrence Carin
AISTATS 2018 Topic Compositional Neural Language Model Wenlin Wang, Zhe Gan, Wenqi Wang, Dinghan Shen, Jiaji Huang, Wei Ping, Sanjeev Satheesh, Lawrence Carin
ICLR 2017 Adaptive Feature Abstraction for Translating Video to Language Yunchen Pu, Martin Renqiang Min, Zhe Gan, Lawrence Carin
ICML 2017 Adversarial Feature Matching for Text Generation Yizhe Zhang, Zhe Gan, Kai Fan, Zhi Chen, Ricardo Henao, Dinghan Shen, Lawrence Carin
NeurIPS 2017 Adversarial Symmetric Variational Autoencoder Yuchen Pu, Weiyao Wang, Ricardo Henao, Liqun Chen, Zhe Gan, Chunyuan Li, Lawrence Carin
NeurIPS 2017 Deconvolutional Paragraph Representation Learning Yizhe Zhang, Dinghan Shen, Guoyin Wang, Zhe Gan, Ricardo Henao, Lawrence Carin
CVPR 2017 Semantic Compositional Networks for Visual Captioning Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng Gao, Lawrence Carin, Li Deng
ICML 2017 Stochastic Gradient Monomial Gamma Sampler Yizhe Zhang, Changyou Chen, Zhe Gan, Ricardo Henao, Lawrence Carin
CVPR 2017 StyleNet: Generating Attractive Visual Captions with Styles Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, Li Deng
NeurIPS 2017 Triangle Generative Adversarial Networks Zhe Gan, Liqun Chen, Weiyao Wang, Yuchen Pu, Yizhe Zhang, Hao Liu, Chunyuan Li, Lawrence Carin
AAAI 2017 Unsupervised Learning with Truncated Gaussian Graphical Models Qinliang Su, Xuejun Liao, Chunyuan Li, Zhe Gan, Lawrence Carin
NeurIPS 2017 VAE Learning via Stein Variational Gradient Descent Yuchen Pu, Zhe Gan, Ricardo Henao, Chunyuan Li, Shaobo Han, Lawrence Carin
AISTATS 2016 Bridging the Gap Between Stochastic Gradient MCMC and Stochastic Optimization Changyou Chen, David E. Carlson, Zhe Gan, Chunyuan Li, Lawrence Carin
ICML 2016 Factored Temporal Sigmoid Belief Networks for Sequence Learning Jiaming Song, Zhe Gan, Lawrence Carin
CVPR 2016 Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification Chunyuan Li, Andrew Stevens, Changyou Chen, Yunchen Pu, Zhe Gan, Lawrence Carin
NeurIPS 2016 Variational Autoencoder for Deep Learning of Images, Labels and Captions Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, Lawrence Carin
NeurIPS 2015 Deep Poisson Factor Modeling Ricardo Henao, Zhe Gan, James Lu, Lawrence Carin
NeurIPS 2015 Deep Temporal Sigmoid Belief Networks for Sequence Modeling Zhe Gan, Chunyuan Li, Ricardo Henao, David E Carlson, Lawrence Carin
AISTATS 2015 Learning Deep Sigmoid Belief Networks with Data Augmentation Zhe Gan, Ricardo Henao, David E. Carlson, Lawrence Carin
ICML 2015 Scalable Deep Poisson Factor Analysis for Topic Modeling Zhe Gan, Changyou Chen, Ricardo Henao, David Carlson, Lawrence Carin