Xu, Boshen

5 publications

ICLR 2025 Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions? Boshen Xu, Ziheng Wang, Yang Du, Zhinan Song, Sipeng Zheng, Qin Jin
NeurIPS 2025 EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining Boshen Xu, Yuting Mei, Liu Xinbi, Sipeng Zheng, Qin Jin
NeurIPS 2025 Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding Ye Wang, Ziheng Wang, Boshen Xu, Yang Du, Kejun Lin, Zihan Xiao, Zihao Yue, Jianzhong Ju, Liang Zhang, Dingyi Yang, Xiangnan Fang, Zewen He, Zhenbo Luo, Wenxuan Wang, Junqi Lin, Jian Luan, Qin Jin
ECCVW 2024 Unveiling Visual Biases in Audio-Visual Localization Benchmarks Liangyu Chen, Zihao Yue, Boshen Xu, Qin Jin
CVPR 2023 Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework Sipeng Zheng, Boshen Xu, Qin Jin