Nagrani, Arsha

30 publications

CVPR 2025 Flexible Frame Selection for Efficient Video Reasoning Shyamal Buch, Arsha Nagrani, Anurag Arnab, Cordelia Schmid
ICCV 2025 MINERVA: Evaluating Complex Video Reasoning Arsha Nagrani, Sachit Menon, Ahmet Iscen, Shyamal Buch, Ramin Mehran, Nilpa Jha, Anja Hauth, Yukun Zhu, Carl Vondrick, Mikhail Sirotenko, Cordelia Schmid, Tobias Weyand
ICCV 2025 Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Eshika Khandelwal, Gül Varol, Weidi Xie, Andrew Zisserman
CVPR 2025 Unbiasing Through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks Nina Shvetsova, Arsha Nagrani, Bernt Schiele, Hilde Kuehne, Christian Rupprecht
CVPR 2024 AutoAD III: The Prequel - Back to the Pixels Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
NeurIPS 2024 Mixture of Nested Experts: Adaptive Processing of Visual Tokens Gagan Jain, Nidhi Hegde, Aditya Kusupati, Arsha Nagrani, Shyamal Buch, Prateek Jain, Anurag Arnab, Sujoy Paul
CVPR 2024 MoReVQA: Exploring Modular Reasoning Models for Video Question Answering Juhong Min, Shyamal Buch, Arsha Nagrani, Minsu Cho, Cordelia Schmid
CVPR 2024 On Scaling up a Multilingual Vision and Language Model Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, Aj Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut
CVPR 2024 Streaming Dense Video Captioning Xingyi Zhou, Anurag Arnab, Shyamal Buch, Shen Yan, Austin Myers, Xuehan Xiong, Arsha Nagrani, Cordelia Schmid
CVPR 2024 VicTR: Video-Conditioned Text Representations for Activity Recognition Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani, Michael S. Ryoo
CVPR 2023 AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid
ICCV 2023 AutoAD II: The Sequel - Who, When, and What in Movie Audio Description Tengda Han, Max Bain, Arsha Nagrani, Gul Varol, Weidi Xie, Andrew Zisserman
CVPR 2023 AutoAD: Movie Description in Context Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
ICCV 2023 UnLoc: A Unified Framework for Video Localization Tasks Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid
ICCV 2023 Verbs in Action: Improving Verb Understanding in Video-Language Models Liliane Momeni, Mathilde Caron, Arsha Nagrani, Andrew Zisserman, Cordelia Schmid
CVPR 2023 Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, Antoine Miech, Jordi Pont-Tuset, Ivan Laptev, Josef Sivic, Cordelia Schmid
NeurIPS 2023 VidChapters-7m: Video Chapters at Scale Antoine Yang, Arsha Nagrani, Ivan Laptev, Josef Sivic, Cordelia Schmid
CVPR 2022 End-to-End Generative Pretraining for Multimodal Video Captioning Paul Hongsuck Seo, Arsha Nagrani, Anurag Arnab, Cordelia Schmid
ECCV 2022 Learning Audio-Video Modalities from Image Captions Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, Cordelia Schmid
WACV 2022 Masking Modalities for Cross-Modal Video Retrieval Valentin Gabeur, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid
ECCV 2022 TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency Medhini Narasimhan, Arsha Nagrani, Chen Sun, Michael Rubinstein, Trevor Darrell, Anna Rohrbach, Cordelia Schmid
NeurIPS 2021 Attention Bottlenecks for Multimodal Fusion Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun
ICCV 2021 Composable Augmentation Encoding for Video Representation Learning Chen Sun, Arsha Nagrani, Yonglong Tian, Cordelia Schmid
ICCV 2021 Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman
CVPR 2021 Localizing Visual Sounds the Hard Way Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman
CVPR 2021 Look Before You Speak: Visually Contextualized Utterances Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid
ECCV 2020 Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos Anurag Arnab, Chen Sun, Arsha Nagrani, Cordelia Schmid
ICCVW 2019 Count, Crop and Recognise: Fine-Grained Recognition in the Wild Max Bain, Arsha Nagrani, Daniel Schofield, Andrew Zisserman
CVPRW 2019 WiCV 2019: The Sixth Women in Computer Vision Workshop Irene Amerini, Elena Balashova, Sayna Ebrahimi, Kathryn Leonard, Arsha Nagrani, Amaia Salvador
ECCV 2018 Learnable PINs: Cross-Modal Embeddings for Person Identity Arsha Nagrani, Samuel Albanie, Andrew Zisserman