Shi, Yang
16 publications
ICLR
2026
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
Xinlong Chen, Yue Ding, Weihong Lin, Jingyun Hua, Linli Yao, Yang Shi, Bozhou Li, Qiang Liu, Yuanxing Zhang, Pengfei Wan, Liang Wang ICLR
2026
BaseReward: A Strong Baseline for Multimodal Reward Model
YiFan Zhang, Haihua Yang, Huanyu Zhang, Yang Shi, Zezhou Chen, Haochen Tian, Chaoyou Fu, Kai Wu, Bo Cui, Xu Wang, Jianfei Pan, Haotian Wang, Zhang Zhang, Liang Wang ICLR
2026
VidBridge-R1: Bridging QA and Captioning for RL-Based Video Understanding Models with Intermediate Proxy Tasks
Xinlong Chen, Yuanxing Zhang, Yushuo Guan, Weihong Lin, Zekun Moore Wang, Bohan Zeng, Yang Shi, Sihan Yang, Qiang Liu, Pengfei Wan, Liang Wang ICML
2025
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Yifan Zhang, Tao Yu, Haochen Tian, Chaoyou Fu, Peiyan Li, Jianshu Zeng, Wulin Xie, Yang Shi, Huanyu Zhang, Junkang Wu, Xue Wang, Yibo Hu, Bin Wen, Tingting Gao, Zhang Zhang, Fan Yang, Di Zhang, Liang Wang, Rong Jin NeurIPS
2025
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
Yang Shi, Huanqian Wang, Wulin Xie, Huanyao Zhang, Lijie Zhao, YiFan Zhang, Xinfeng Li, Chaoyou Fu, Zhuoer Wen, Wenting Liu, Zhuoran Zhang, Xinlong Chen, Bohan Zeng, Sihan Yang, Yushuo Guan, Zhang Zhang, Liang Wang, Haoxuan Li, Zhouchen Lin, Yuanxing Zhang, Pengfei Wan, Haotian Wang, Wenjing Yang