Grasch, Peter
6 publications
CVPR
2025
FastVLM: Efficient Vision Encoding for Vision Language Models
Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate true, Albert Antony, Gokula Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pouransari ICCV
2025
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger, Nina Wenzel, David Griffiths, Haiming Gang, Justin Lazarow, Gefen Kohavi, Kai Kang, Marcin Eichner, Yinfei Yang, Afshin Dehghan, Peter Grasch ICLR
2025
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-Tuning
Haotian Zhang, Mingfei Gao, Zhe Gan, Philipp Dufter, Nina Wenzel, Forrest Huang, Dhruti Shah, Xianzhi Du, Bowen Zhang, Yanghao Li, Sam Dodge, Keen You, Zhen Yang, Aleksei Timofeev, Mingze Xu, Hong-You Chen, Jean-Philippe Fauconnier, Zhengfeng Lai, Haoxuan You, Zirui Wang, Afshin Dehghan, Peter Grasch, Yinfei Yang ICLR
2025
Revisit Large-Scale Image-Caption Data in Pre-Training Multimodal Foundation Models
Zhengfeng Lai, Vasileios Saveris, Chen Chen, Hong-You Chen, Haotian Zhang, Bowen Zhang, Wenze Hu, Juan Lao Tebar, Zhe Gan, Peter Grasch, Meng Cao, Yinfei Yang ECCV
2024
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-Training
Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Samuel Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Futang Peng, Anton Belyi, Max A Schwarzer, Hongyu Hè, Xianzhi Du, Haotian Zhang, Karanjeet Singh, Doug Kang, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang