Cheng, Xize

15 publications

AAAI 2025 A Wander Through the Multimodal Landscape: Efficient Transfer Learning via Low-Rank Sequence Multimodal Adapter Zirun Guo, Xize Cheng, Yangyang Wu, Tao Jin
NeurIPS 2025 AHa-Bench: Benchmarking Audio Hallucinations in Large Audio-Language Models Xize Cheng, Dongjie Fu, Chenyuhao Wen, Shannon Yu, Zehan Wang, Shengpeng Ji, Siddhant Arora, Tao Jin, Shinji Watanabe, Zhou Zhao
ICLR 2025 OmniBind: Large-Scale Omni Multimodal Representation via Binding Spaces Zehan Wang, Ziang Zhang, Minjie Hong, Hang Zhang, Luping Liu, Rongjie Huang, Xize Cheng, Shengpeng Ji, Tao Jin, Hengshuang Zhao, Zhou Zhao
ICLR 2025 OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup Xize Cheng, Siqi Zheng, Zehan Wang, Minghui Fang, Ziang Zhang, Rongjie Huang, Shengpeng Ji, Jialong Zuo, Tao Jin, Zhou Zhao
CVPR 2025 SpatialCLIP: Learning 3D-Aware Image Representations from Spatially Discriminative Language Zehan Wang, Sashuai Zhou, Shaoxuan He, Haifeng Huang, Lihe Yang, Ziang Zhang, Xize Cheng, Shengpeng Ji, Tao Jin, Hengshuang Zhao, Zhou Zhao
ICLR 2025 VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words? Xize Cheng, Ruofan Hu, Xiaoda Yang, Jingyu Lu, Dongjie Fu, Zehan Wang, Shengpeng Ji, Rongjie Huang, Boyang Zhang, Tao Jin, Zhou Zhao
ICLR 2025 WavTokenizer: An Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Shengpeng Ji, Ziyue Jiang, Wen Wang, Yifu Chen, Minghui Fang, Jialong Zuo, Qian Yang, Xize Cheng, Zehan Wang, Ruiqi Li, Ziang Zhang, Xiaoda Yang, Rongjie Huang, Yidi Jiang, Qian Chen, Siqi Zheng, Zhou Zhao
NeurIPS 2024 Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers Haifeng Huang, Yilun Chen, Zehan Wang, Rongjie Huang, Runsen Xu, Tai Wang, Luping Liu, Xize Cheng, Yang Zhao, Jiangmiao Pang, Zhou Zhao
NeurIPS 2024 Extending Multi-Modal Contrastive Representations Ziang Zhang, Zehan Wang, Luping Liu, Rongjie Huang, Xize Cheng, Zhenhui Ye, Wang Lin, Huadai Liu, Haifeng Huang, Yang Zhao, Tao Jin, Siqi Zheng, Zhou Zhao
ICML 2024 FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion Zehan Wang, Ziang Zhang, Xize Cheng, Rongjie Huang, Luping Liu, Zhenhui Ye, Haifeng Huang, Yang Zhao, Tao Jin, Peng Gao, Zhou Zhao
ICML 2024 InstructSpeech: Following Speech Editing Instructions via Large Language Models Rongjie Huang, Ruofan Hu, Yongqi Wang, Zehan Wang, Xize Cheng, Ziyue Jiang, Zhenhui Ye, Dongchao Yang, Luping Liu, Peng Gao, Zhou Zhao
NeurIPS 2023 Connecting Multi-Modal Contrastive Representations Zehan Wang, Yang Zhao, Xize 成, Haifeng Huang, Jiageng Liu, Aoxiong Yin, Li Tang, Linjun Li, Yongqi Wang, Ziang Zhang, Zhou Zhao
ICCV 2023 Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding Zehan Wang, Haifeng Huang, Yang Zhao, Linjun Li, Xize Cheng, Yichen Zhu, Aoxiong Yin, Zhou Zhao
ICCV 2023 Exploring Group Video Captioning with Efficient Relational Approximation Wang Lin, Tao Jin, Ye Wang, Wenwen Pan, Linjun Li, Xize Cheng, Zhou Zhao
ICCV 2023 MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition Xize Cheng, Tao Jin, Rongjie Huang, Linjun Li, Wang Lin, Zehan Wang, Ye Wang, Huadai Liu, Aoxiong Yin, Zhou Zhao