ML Anthology
Authors
Search
About
Zheng, Sipeng
14 publications
ICLR
2025
Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?
Boshen Xu
,
Ziheng Wang
,
Yang Du
,
Zhinan Song
,
Sipeng Zheng
,
Qin Jin
NeurIPS
2025
EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining
Boshen Xu
,
Yuting Mei
,
Liu Xinbi
,
Sipeng Zheng
,
Qin Jin
ICLR
2025
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Wanpeng Zhang
,
Zilong Xie
,
Yicheng Feng
,
Yijiang Li
,
Xingrun Xing
,
Sipeng Zheng
,
Zongqing Lu
ICCV
2025
MotionCtrl: A Real-Time Controllable Vision-Language-Motion Model
Bin Cao
,
Sipeng Zheng
,
Ye Wang
,
Lujie Xia
,
Qianshan Wei
,
Qin Jin
,
Jing Liu
,
Zongqing Lu
NeurIPS
2025
OpenMMEgo: Enhancing Egocentric Understanding for LMMs with Open Weights and Data
Hao Luo
,
Zihao Yue
,
Wanpeng Zhang
,
Yicheng Feng
,
Sipeng Zheng
,
Deheng Ye
,
Zongqing Lu
ICML
2025
Scaling Large Motion Models with Million-Level Human Motions
Ye Wang
,
Sipeng Zheng
,
Bin Cao
,
Qianshan Wei
,
Weishuai Zeng
,
Qin Jin
,
Zongqing Lu
ICCV
2025
Unified Multimodal Understanding via Byte-Pair Visual Encoding
Wanpeng Zhang
,
Yicheng Feng
,
Hao Luo
,
Yijiang Li
,
Zihao Yue
,
Sipeng Zheng
,
Zongqing Lu
ICCV
2025
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng
,
Yijiang Li
,
Wanpeng Zhang
,
Sipeng Zheng
,
Hao Luo
,
Zihao Yue
,
Zongqing Lu
ICLR
2024
Steve-Eye: Equipping LLM-Based Embodied Agents with Visual Perception in Open Worlds
Sipeng Zheng
,
Jiazheng Liu
,
Yicheng Feng
,
Zongqing Lu
ECCV
2024
UniCode : Learning a Unified Codebook for Multimodal Large Language Models
Sipeng Zheng
,
Bohan Zhou
,
Yicheng Feng
,
Ye Wang
,
Zongqing Lu
AAAI
2023
Accommodating Audio Modality in CLIP for Multimodal Processing
Ludan Ruan
,
Anwen Hu
,
Yuqing Song
,
Liang Zhang
,
Sipeng Zheng
,
Qin Jin
CVPR
2023
Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework
Sipeng Zheng
,
Boshen Xu
,
Qin Jin
ECCV
2022
Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning
Sipeng Zheng
,
Shizhe Chen
,
Qin Jin
CVPR
2022
VRDFormer: End-to-End Video Visual Relation Detection with Transformers
Sipeng Zheng
,
Shizhe Chen
,
Qin Jin