Ryoo, Michael S.

65 publications

TMLR 2026 SMILE: A Composite Lexical-Semantic Metric for Question-Answering Evaluation Shrikant Kendre, Austin Xu, Honglu Zhou, Michael S Ryoo, Shafiq Joty, Juan Carlos Niebles

ICCV 2025 Adaptive Caching for Faster Video Generation with Diffusion Transformers Kumara Kahatapitiya, Haozhe Liu, Sen He, Ding Liu, Menglin Jia, Chenyang Zhang, Michael S. Ryoo, Tian Xie

ICLR 2025 LLaRA: Supercharging Robot Learning Data for Vision-Language Policy Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Sung Jang, Jinghuan Shang, Kanchana Ranasinghe, Ryan D Burgert, Mu Cai, Yong Jae Lee, Michael S Ryoo

ICLR 2025 Understanding Long Videos with Multimodal Language Models Kanchana Ranasinghe, Xiang Li, Kumara Kahatapitiya, Michael S Ryoo

ICLRW 2025 VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making Jake Grigsby, Yuke Zhu, Michael S Ryoo, Juan Carlos Niebles

ECCV 2024 CoPT: Unsupervised Domain Adaptive Segmentation Using Domain-Agnostic Text Embeddings Cristina Mata, Kanchana N Ranasinghe, Michael S Ryoo

WACV 2024 Grafting Vision Transformers Jongwoo Park, Kumara Kahatapitiya, Donghyun Kim, Shivchander Sudalairaj, Quanfu Fan, Michael S. Ryoo

ECCVW 2024 Image Translation with Kernel Prediction Networks for Semantic Segmentation Cristina Mata, Michael S. Ryoo, Henrik Turbell

NeurIPSW 2024 Language Repository for Long Video Understanding Kumara Kahatapitiya, Kanchana Ranasinghe, Jongwoo Park, Michael S Ryoo

CVPR 2024 Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed, Michael S. Ryoo, Tsung-Yu Lin

WACV 2024 Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders Srijan Das, Tanmay Jain, Dominick Reilly, Pranav Balaji, Soumyajit Karmakar, Shyam Marjit, Xiang Li, Abhijit Das, Michael S. Ryoo

CVPR 2024 MAGICK: A Large-Scale Captioned Dataset from Matting Generated Images Using Chroma Keying Ryan D. Burgert, Brian L. Price, Jason Kuen, Yijun Li, Michael S. Ryoo

CVPR 2024 Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities Aj Piergiovanni, Isaac Noble, Dahun Kim, Michael S. Ryoo, Victor Gomes, Anelia Angelova

NeurIPSW 2024 Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA Jongwoo Park, Kanchana Ranasinghe, Kumara Kahatapitiya, Wonjeong Ryu, Donghyun Kim, Michael S Ryoo

CVPR 2024 VicTR: Video-Conditioned Text Representations for Activity Recognition Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani, Michael S. Ryoo

ECCVW 2024 xGen-VideoSyn-1: High-Fidelity Text-to-Video Synthesis with Compressed Representations Can Qin, Congying Xia, Krithika Ramakrishnan, Michael S. Ryoo, Lifu Tu, Yihao Feng, Manli Shu, Honglu Zhou, Anas Awadalla, Jun Wang, Senthil Purushwalkam, Le Xue, Yingbo Zhou, Huan Wang, Silvio Savarese, Juan Carlos Niebles, Zeyuan Chen, Ran Xu, Caiming Xiong

NeurIPS 2023 Active Vision Reinforcement Learning Under Limited Visual Observability Jinghuan Shang, Michael S Ryoo

NeurIPS 2023 Language-Based Action Concept Spaces Improve Video Self-Supervised Learning Kanchana Ranasinghe, Michael S Ryoo

CoRL 2023 RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control Brianna Zitkovich, Tianhe Yu, Sichun Xu, Peng Xu, Ted Xiao, Fei Xia, Jialin Wu, Paul Wohlhart, Stefan Welker, Ayzaan Wahid, Quan Vuong, Vincent Vanhoucke, Huong Tran, Radu Soricut, Anikait Singh, Jaspiar Singh, Pierre Sermanet, Pannag R. Sanketi, Grecia Salazar, Michael S. Ryoo, Krista Reymann, Kanishka Rao, Karl Pertsch, Igor Mordatch, Henryk Michalewski, Yao Lu, Sergey Levine, Lisa Lee, Tsang-Wei Edward Lee, Isabel Leal, Yuheng Kuang, Dmitry Kalashnikov, Ryan Julian, Nikhil J. Joshi, Alex Irpan, Brian Ichter, Jasmine Hsu, Alexander Herzog, Karol Hausman, Keerthana Gopalakrishnan, Chuyuan Fu, Pete Florence, Chelsea Finn, Kumar Avinava Dubey, Danny Driess, Tianli Ding, Krzysztof Marcin Choromanski, Xi Chen, Yevgen Chebotar, Justice Carbajal, Noah Brown, Anthony Brohan, Montserrat Gonzalez Arenas, Kehang Han

IJCAI 2023 SWAT: Spatial Structure Within and Among Tokens Kumara Kahatapitiya, Michael S. Ryoo

ICLR 2023 Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language Andy Zeng, Maria Attarian, Brian Ichter, Krzysztof Marcin Choromanski, Adrian Wong, Stefan Welker, Federico Tombari, Aveek Purohit, Michael S Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, Pete Florence

CVPR 2023 Token Turing Machines Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab

WACV 2023 ViewCLR: Learning Self-Supervised Video Representation for Unseen Viewpoints Srijan Das, Michael S. Ryoo

AAAI 2023 Weakly-Guided Self-Supervised Pretraining for Temporal Activity Detection Kumara Kahatapitiya, Zhou Ren, Haoxiang Li, Zhenyu Wu, Michael S. Ryoo, Gang Hua

ICLR 2022 Hybrid Random Features Krzysztof Marcin Choromanski, Han Lin, Haoxian Chen, Arijit Sehanobish, Yuanzhe Ma, Deepali Jain, Jake Varley, Andy Zeng, Michael S Ryoo, Valerii Likhosherstov, Dmitry Kalashnikov, Vikas Sindhwani, Adrian Weller

CVPR 2022 MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection Rui Dai, Srijan Das, Kumara Kahatapitiya, Michael S. Ryoo, François Brémond

CVPR 2022 Self-Supervised Video Transformer Kanchana Ranasinghe, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Michael S. Ryoo

ECCV 2022 StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning Jinghuan Shang, Kumara Kahatapitiya, Xiang Li, Michael S. Ryoo

CoRL 2022 TRITON: Neural Neural Textures for Better Sim2Real Ryan D. Burgert, Jinghuan Shang, Xiang Li, Michael S. Ryoo

ECCV 2022 Video Question Answering with Iterative Video-Text Co-Tokenization Aj Piergiovanni, Kairo Morton, Weicheng Kuo, Michael S. Ryoo, Anelia Angelova

ICCV 2021 4D-Net for Learned Multi-Modal Alignment Aj Piergiovanni, Vincent Casser, Michael S. Ryoo, Anelia Angelova

CVPRW 2021 Adaptive Intermediate Representations for Video Understanding Juhana Kangaspunta, A. J. Piergiovanni, Rico Jonschkowski, Michael S. Ryoo, Anelia Angelova

CVPR 2021 Coarse-Fine Networks for Temporal Activity Detection in Videos Kumara Kahatapitiya, Michael S. Ryoo

CVPR 2021 Recognizing Actions in Videos from Unseen Viewpoints Aj Piergiovanni, Michael S. Ryoo

ECCV 2020 Adversarial Generative Grammars for Human Activity Prediction Aj Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo

ECCV 2020 AssembleNet++: Assembling Modality Representations via Attention Connections Michael S. Ryoo, Aj Piergiovanni, Juhana Kangaspunta, Anelia Angelova

ICLR 2020 AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures Michael S. Ryoo, Aj Piergiovanni, Mingxing Tan, Anelia Angelova

ECCV 2020 AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification Xiaofang Wang, Xuehan Xiong, Maxim Neumann, Aj Piergiovanni, Michael S. Ryoo, Anelia Angelova, Kris M. Kitani, Wei Hua

AAAI 2020 Differentiable Grammars for Videos A. J. Piergiovanni, Anelia Angelova, Michael S. Ryoo

ECCV 2020 Password-Conditioned Anonymization and Deanonymization with Face Identity Transformers Xiuye Gu, Weixin Luo, Michael S. Ryoo, Yong Jae Lee

CVPRW 2019 Early Detection of Injuries in MLB Pitchers from Video A. J. Piergiovanni, Michael S. Ryoo

CoRL 2019 Model-Based Behavioral Cloning with Future Image Similarity Learning Alan Wu, Aj Piergiovanni, Michael S. Ryoo

CVPRW 2018 Action-Conditioned Convolutional Future Regression Models for Robot Imitation Learning Alan Wu, A. J. Piergiovanni, Michael S. Ryoo

AAAI 2018 Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning Michael S. Ryoo, Kiyoon Kim, Hyun Jong Yang

CVPRW 2018 Fine-Grained Activity Recognition in Baseball Videos A. J. Piergiovanni, Michael S. Ryoo

ECCVW 2018 Forecasting Hands and Objects in Future Frames Chenyou Fan, Jangwon Lee, Michael S. Ryoo

ECCV 2018 Joint Person Segmentation and Identification in Synchronized First- and Third-Person Videos Mingze Xu, Chenyou Fan, Yuchen Wang, Michael S. Ryoo, David J. Crandall

ECCV 2018 Learning to Anonymize Faces for Privacy Preserving Action Detection Zhongzheng Ren, Yong Jae Lee, Michael S. Ryoo

CVPR 2017 Identifying First-Person Camera Wearers in Third-Person Videos Chenyou Fan, Jangwon Lee, Mingze Xu, Krishna Kumar Singh, Yong Jae Lee, David J. Crandall, Michael S. Ryoo

CVPRW 2017 Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression Jangwon Lee, Michael S. Ryoo

IJCAI 2017 Multi-Type Activity Recognition from a Robot's Viewpoint Ilaria Gori, J. K. Aggarwal, Larry H. Matthies, Michael S. Ryoo

AAAI 2017 Privacy-Preserving Human Activity Recognition from Extreme Low Resolution Michael S. Ryoo, Brandon Rothrock, Charles Fleming, Hyun Jong Yang

AAAI 2017 Title Learning Latent Subevents in Activity Videos Using Temporal Attention Filters A. J. Piergiovanni, Chenyou Fan, Michael S. Ryoo

IJCAI 2016 Learning Social Affordance for Human-Robot Interaction Tianmin Shu, Michael S. Ryoo, Song-Chun Zhu

CVPR 2015 Pooled Motion Features for First-Person Videos Michael S. Ryoo, Brandon Rothrock, Larry Matthies

WACV 2015 Robot-Centric Activity Recognition from First-Person RGB-D Videos Lu Xia, Ilaria Gori, Jake K. Aggarwal, Michael S. Ryoo

CVPRW 2014 An Introduction to the 3rd Workshop on Egocentric (First-Person) Vision Steve Mann, Kris M. Kitani, Yong Jae Lee, Michael S. Ryoo, Alireza Fathi

CVPR 2013 First-Person Activity Recognition: What Are They Doing to Me? Michael S. Ryoo, Larry Matthies

ICCV 2011 Human Activity Prediction: Early Recognition of Ongoing Activities from Streaming Videos Michael S. Ryoo

ICCV 2009 Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities Michael S. Ryoo, Jake K. Aggarwal

CVPRW 2009 Stochastic Representation and Recognition of High-Level Group Activities: Describing Structural Uncertainties in Human Activities Michael S. Ryoo, Jake K. Aggarwal

CVPR 2008 Observe-and-Explain: A New Approach for Multiple Hypotheses Tracking of Humans and Objects Michael S. Ryoo, Jake K. Aggarwal

CVPR 2007 Hierarchical Recognition of Human Activities Interacting with Objects Michael S. Ryoo, J. K. Aggarwal

IJCAI 2007 Robust Human-Computer Interaction System Guiding a User by Providing Feedback Michael S. Ryoo, Jake K. Aggarwal

CVPR 2006 Recognition of Composite Human Activities Through Context-Free Grammar Based Representation Michael S. Ryoo, J. K. Aggarwal