Hoiem, Derek
63 publications
CVPR
2024
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
CVPR
2022
Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture
ICCV
2017
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks