Lu, Jiasen

25 publications

NeurIPS 2025 CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching Chen Chen, Pengsheng Guo, Liangchen Song, Jiasen Lu, Rui Qian, Tsu-Jui Fu, Xinze Wang, Wei Liu, Yinfei Yang, Alex Schwing
ICLR 2025 MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA Hanrong Ye, Haotian Zhang, Erik Daxberger, Lin Chen, Zongyu Lin, Yanghao Li, Bowen Zhang, Haoxuan You, Dan Xu, Zhe Gan, Jiasen Lu, Yinfei Yang
CVPR 2025 Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Taira Anderson, Erin Bransom, Kiana Ehsani, Huong Ngo, YenSung Chen, Ajay Patel, Mark Yatskar, Chris Callison-Burch, Andrew Head, Rose Hendrix, Favyen Bastani, Eli VanderBilt, Nathan Lambert, Yvonne Chou, Arnavi Chheda, Jenna Sparks, Sam Skjonsberg, Michael Schmitz, Aaron Sarnat, Byron Bischoff, Pete Walsh, Chris Newell, Piper Wolters, Tanmay Gupta, Kuo-Hao Zeng, Jon Borchardt, Dirk Groeneveld, Crystal Nam, Sophie Lebrecht, Caitlin Wittlif, Carissa Schoenick, Oscar Michel, Ranjay Krishna, Luca Weihs, Noah A. Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, Aniruddha Kembhavi
CVPR 2025 One Diffusion to Generate Them All Duong H. Le, Tuan Pham, Sangho Lee, Christopher Clark, Aniruddha Kembhavi, Stephan Mandt, Ranjay Krishna, Jiasen Lu
ICCV 2025 STIV: Scalable Text and Image Conditioned Video Generation Zongyu Lin, Wei Liu, Chen Chen, Jiasen Lu, Wenze Hu, Tsu-Jui Fu, Jesse Allardice, Zhengfeng Lai, Liangchen Song, Bowen Zhang, Cha Chen, Yiran Fei, Lezhi Li, Yinfei Yang, Yizhou Sun, Kai-Wei Chang
ICLRW 2025 Stiv: Scalable Text and Image Conditioned Video Generation Zongyu Lin, Wei Liu, Chen Chen, Jiasen Lu, Wenze Hu, Tsu-Jui Fu, Jesse Allardice, Zhengfeng Lai, Liangchen Song, Bowen Zhang, Cha Chen, Yiran Fei, Yifan Jiang, Lezhi Li, Yizhou Sun, Kai-Wei Chang, Yinfei Yang
ICLR 2025 The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities Zhaofeng Wu, Xinyan Velocity Yu, Dani Yogatama, Jiasen Lu, Yoon Kim
NeurIPS 2025 UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation Rui Tian, Mingfei Gao, Mingze Xu, Jiaming Hu, Jiasen Lu, Zuxuan Wu, Yinfei Yang, Afshin Dehghan
CVPR 2024 Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi
ICLR 2023 UNIFIED-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks Jiasen Lu, Christopher Clark, Rowan Zellers, Roozbeh Mottaghi, Aniruddha Kembhavi
CVPR 2022 MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound Rowan Zellers, Jiasen Lu, Ximing Lu, Youngjae Yu, Yanpeng Zhao, Mohammadreza Salehi, Aditya Kusupati, Jack Hessel, Ali Farhadi, Yejin Choi
AAAI 2022 Multi-Modal Answer Validation for Knowledge-Based VQA Jialin Wu, Jiasen Lu, Ashish Sabharwal, Roozbeh Mottaghi
NeurIPS 2021 Container: Context Aggregation Networks Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi
NeurIPS 2020 Dialog Without Dialog Data: Learning Visual Dialog Agents from VQA Data Michael Cogswell, Jiasen Lu, Rishabh Jain, Stefan Lee, Devi Parikh, Dhruv Batra
ICLR 2020 Emergence of Compositional Language with Deep Generational Transmission Michael Cogswell, Jiasen Lu, Stefan Lee, Devi Parikh, Dhruv Batra
ECCV 2020 Spatially Aware Multimodal Transformers for TextVQA Yash Kant, Dhruv Batra, Peter Anderson, Alexander Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal
ICLR 2019 Self-Monitoring Navigation Agent via Auxiliary Progress Estimation Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, Caiming Xiong
NeurIPS 2019 ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee
ECCV 2018 Graph R-CNN for Scene Graph Generation Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh
CoRL 2018 Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh
NeurIPS 2017 Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model Jiasen Lu, Anitha Kannan, Jianwei Yang, Devi Parikh, Dhruv Batra
CVPR 2017 Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning Jiasen Lu, Caiming Xiong, Devi Parikh, Richard Socher
NeurIPS 2016 Hierarchical Question-Image Co-Attention for Visual Question Answering Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh
CVPR 2015 Human Action Segmentation with Hierarchical Supervoxel Consistency Jiasen Lu, Ran Xu, Jason J. Corso
ICCV 2015 VQA: Visual Question Answering Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh