Cai, Mu

18 publications

TMLR 2026 A Survey of Token Compression for Efficient Multimodal Large Language Models Kele Shao, Keda Tao, Kejia Zhang, Sicheng Feng, Mu Cai, Yuzhang Shang, Haoxuan You, Can Qin, Yang Sui, Huan Wang
WACV 2025 An Investigation on LLMs' Visual Understanding Ability Using SVG for Image-Text Bridging Mu Cai, Zeyi Huang, Yuheng Li, Utkarsh Ojha, Haohan Wang, Yong Jae Lee
ICLR 2025 LLaRA: Supercharging Robot Learning Data for Vision-Language Policy Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Sung Jang, Jinghuan Shang, Kanchana Ranasinghe, Ryan D Burgert, Mu Cai, Yong Jae Lee, Michael S Ryoo
ICCV 2025 LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan
CVPR 2025 Magma: A Foundation Model for Multimodal AI Agents Jianwei Yang, Reuben Tan, Qianhui Wu, Ruijie Zheng, Baolin Peng, Yongyuan Liang, Yu Gu, Mu Cai, Seonghyeon Ye, Joel Jang, Yuquan Deng, Jianfeng Gao
ICLR 2025 Matryoshka Multimodal Models Mu Cai, Jianwei Yang, Jianfeng Gao, Yong Jae Lee
CPAL 2024 Investigating the Catastrophic Forgetting in Multimodal Large Language Model Fine-Tuning Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma
NeurIPSW 2024 Matryoshka Multimodal Models Mu Cai, Jianwei Yang, Jianfeng Gao, Yong Jae Lee
ECCV 2024 Removing Distributional Discrepancies in Captions Improves Image-Text Alignment Mu Cai, Haotian Liu, Yuheng Li, Yijun Li, Eli Shechtman, Zhe Lin, Yong Jae Lee, Krishna Kumar Singh
NeurIPSW 2024 TemporalBench: Benchmarking Fine-Grained Temporal Understanding for Multimodal Video Models Mu Cai, Reuben Tan, Jianrui Zhang, Bocheng Zou, Kai Zhang, Yao Feng, Fangrui Zhu, Jing Gu, Yiwu Zhong, Yuzhang Shang, Yao Dou, Jaden Park, Jianfeng Gao, Yong Jae Lee, Jianwei Yang
CVPR 2024 ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee
NeurIPS 2024 Yo'LLaVA: Your Personalized Language and Vision Assistant Thao Nguyen, Haotian Liu, Yuheng Li, Mu Cai, Utkarsh Ojha, Yong Jae Lee
ICCV 2023 A Sentence Speaks a Thousand Images: Domain Generalization Through Distilling CLIP with Language Guidance Zeyi Huang, Andy Zhou, Zijian Ling, Mu Cai, Haohan Wang, Yong Jae Lee
NeurIPSW 2023 Investigating the Catastrophic Forgetting in Multimodal Large Language Models Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma
WACV 2023 Out-of-Distribution Detection via Frequency-Regularized Generative Models Mu Cai, Yixuan Li
ECCV 2022 Masked Discrimination for Self-Supervised Learning on Point Clouds Haotian Liu, Mu Cai, Yong Jae Lee
ICLR 2022 VOS: Learning What You Don't Know by Virtual Outlier Synthesis Xuefeng Du, Zhaoning Wang, Mu Cai, Yixuan Li
ICCV 2021 Frequency Domain Image Translation: More Photo-Realistic, Better Identity-Preserving Mu Cai, Hong Zhang, Huijuan Huang, Qichuan Geng, Yixuan Li, Gao Huang