Yang, Zhenheng

23 publications

ICLR 2026 FOCUS: Efficient Keyframe Selection for Long Video Understanding Zirui Zhu, Hailun Xu, Yang Luo, Yong Liu, Kanchan Sarkar, Zhenheng Yang, Yang You
ICLR 2026 Mixture of Contexts for Long Video Generation Shengqu Cai, Ceyuan Yang, Lvmin Zhang, Yuwei Guo, Junfei Xiao, Ziyan Yang, Yinghao Xu, Zhenheng Yang, Alan Yuille, Leonidas Guibas, Maneesh Agrawala, Lu Jiang, Gordon Wetzstein
ICLR 2026 MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs Yipeng Du, Tiehan Fan, Kepan Nan, Rui Xie, Penghao Zhou, Xiang Li, Jian Yang, Zhenheng Yang, Ying Tai
NeurIPS 2025 DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling Yuang Ai, Qihang Fan, Xuefeng Hu, Zhenheng Yang, Ran He, Huaibo Huang
CVPR 2025 InstanceCap: Improving Text-to-Video Generation via Instance-Aware Structured Caption Tiehan Fan, Kepan Nan, Rui Xie, Penghao Zhou, Zhenheng Yang, Chaoyou Fu, Xiang Li, Jian Yang, Ying Tai
ICCV 2025 Long Context Tuning for Video Generation Yuwei Guo, Ceyuan Yang, Ziyan Yang, Zhibei Ma, Zhijie Lin, Zhenheng Yang, Dahua Lin, Lu Jiang
ICLR 2025 OpenVid-1m: A Large-Scale High-Quality Dataset for Text-to-Video Generation Kepan Nan, Rui Xie, Penghao Zhou, Tiehan Fan, Zhenheng Yang, Zhijie Chen, Xiang Li, Jian Yang, Ying Tai
CVPR 2025 Parallelized Autoregressive Visual Generation Yuqing Wang, Shuhuai Ren, Zhijie Lin, Yujin Han, Haoyuan Guo, Zhenheng Yang, Difan Zou, Jiashi Feng, Xihui Liu
ICCV 2025 STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution Rui Xie, Yinhong Liu, Penghao Zhou, Chen Zhao, Jun Zhou, Kai Zhang, Zhenyu Zhang, Jian Yang, Zhenheng Yang, Ying Tai
NeurIPS 2025 Show-O2: Improved Native Unified Multimodal Models Jinheng Xie, Zhenheng Yang, Mike Zheng Shou
ICLR 2025 Show-O: One Single Transformer to Unify Multimodal Understanding and Generation Jinheng Xie, Weijia Mao, Zechen Bai, David Junhao Zhang, Weihao Wang, Kevin Qinghong Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Zheng Shou
NeurIPSW 2024 InfiMM-WebMath-40b: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Xiaotian Han, Yiren Jian, Xuefeng Hu, Haogeng Liu, Yiqi Wang, Qihang Fan, Yuang Ai, Huaibo Huang, Ran He, Zhenheng Yang, Quanzeng You
CVPR 2021 Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency Qing Liu, Vignesh Ramanathan, Dhruv Mahajan, Alan Yuille, Zhenheng Yang
ECCV 2020 SPAN: Spatial Pyramid Attention Network for Image Manipulation Localization Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, Ram Nevatia
CVPR 2019 Activity Driven Weakly Supervised Object Detection Zhenheng Yang, Dhruv Mahajan, Deepti Ghadiyaram, Ram Nevatia, Vignesh Ramanathan
CVPR 2019 UnOS: Unified Unsupervised Optical-Flow and Stereo-Depth Estimation by Watching Videos Yang Wang, Peng Wang, Zhenheng Yang, Chenxu Luo, Yi Yang, Wei Xu
ECCVW 2018 Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia
WACV 2018 Face and Body Association for Video-Based Face Recognition KangGeon Kim, Zhenheng Yang, Iacopo Masi, Ramakant Nevatia, Gérard G. Medioni
CVPR 2018 LEGO: Learning Edge with Geometry All at Once by Watching Videos Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia
CVPR 2018 Occlusion Aware Unsupervised Learning of Optical Flow Yang Wang, Yi Yang, Zhenheng Yang, Liang Zhao, Peng Wang, Wei Xu
AAAI 2018 Unsupervised Learning of Geometry from Videos with Edge-Aware Depth-Normal Consistency Zhenheng Yang, Peng Wang, Wei Xu, Liang Zhao, Ramakant Nevatia
ICCV 2017 TALL: Temporal Activity Localization via Language Query Jiyang Gao, Chen Sun, Zhenheng Yang, Ram Nevatia
ICCV 2017 TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals Jiyang Gao, Zhenheng Yang, Kan Chen, Chen Sun, Ram Nevatia