Arnab, Anurag

41 publications

ICLR 2025 Dense Video Object Captioning from Disjoint Supervision Xingyi Zhou, Anurag Arnab, Chen Sun, Cordelia Schmid
CVPR 2025 Flexible Frame Selection for Efficient Video Reasoning Shyamal Buch, Arsha Nagrani, Anurag Arnab, Cordelia Schmid
ICCV 2025 From Image to Video: An Empirical Study of Diffusion Representations Pedro Vélez, Luisa F. Polanía, Yi Yang, Chuhan Zhang, Rishabh Kabra, Anurag Arnab, Mehdi S. M. Sajjadi
ICCV 2025 Principles of Visual Tokens for Efficient Video Understanding Xinyue Hao, Gen Li, Shreyank N Gowda, Robert B. Fisher, Jonathan Huang, Anurag Arnab, Laura Sevilla-Lara
NeurIPS 2025 Progressive Data Dropout: An Embarrassingly Simple Approach to Train Faster M S Shriram, Xinyue Hao, Shihao Hou, Yang Lu, Laura Sevilla-Lara, Anurag Arnab, Shreyank N Gowda
NeurIPS 2025 Seg4Diff: Unveiling Open-Vocabulary Semantic Segmentation in Text-to-Image Diffusion Transformers Chaehyun Kim, Heeseong Shin, Eunbeen Hong, Heeji Yoon, Anurag Arnab, Paul Hongsuck Seo, Sunghwan Hong, Seungryong Kim
NeurIPS 2025 Temporal Chain of Thought: Long-Video Understanding by Thinking in Frames Anurag Arnab, Ahmet Iscen, Mathilde Caron, Alireza Fathi, Cordelia Schmid
CVPR 2024 CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation Seokju Cho, Heeseong Shin, Sunghwan Hong, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim
CVPR 2024 End-to-End Spatio-Temporal Action Localisation with Video Transformers Alexey A. Gritsenko, Xuehan Xiong, Josip Djolonga, Mostafa Dehghani, Chen Sun, Mario Lucic, Cordelia Schmid, Anurag Arnab
NeurIPS 2024 Mixture of Nested Experts: Adaptive Processing of Visual Tokens Gagan Jain, Nidhi Hegde, Aditya Kusupati, Arsha Nagrani, Shyamal Buch, Prateek Jain, Anurag Arnab, Sujoy Paul
CVPR 2024 On Scaling up a Multilingual Vision and Language Model Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, Aj Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut
ECCV 2024 Optimizing Factorized Encoder Models: Time and Memory Reduction for Scalable and Efficient Action Recognition Shreyank N Gowda, Anurag Arnab, Jonathan Huang
CVPR 2024 Pixel-Aligned Language Model Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid
CVPR 2024 Streaming Dense Video Captioning Xingyi Zhou, Anurag Arnab, Shyamal Buch, Shen Yan, Austin Myers, Xuehan Xiong, Arsha Nagrani, Cordelia Schmid
CVPR 2024 Time- Memory- and Parameter-Efficient Visual Adaptation Otniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid, Anurag Arnab
NeurIPS 2024 Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels Heeseong Shin, Chaehyun Kim, Sunghwan Hong, Seokju Cho, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim
CVPR 2024 VicTR: Video-Conditioned Text Representations for Activity Recognition Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani, Michael S. Ryoo
ICML 2023 Adaptive Computation with Elastic Input Sequence Fuzhao Xue, Valerii Likhosherstov, Anurag Arnab, Neil Houlsby, Mostafa Dehghani, Yang You
ICCV 2023 Audiovisual Masked Autoencoders Mariana-Iuliana Georgescu, Eduardo Fonseca, Radu Tudor Ionescu, Mario Lucic, Cordelia Schmid, Anurag Arnab
NeurIPS 2023 Does Visual Pretraining Help End-to-End Reasoning? Chen Sun, Calvin Luo, Xingyi Zhou, Anurag Arnab, Cordelia Schmid
CVPR 2023 How Can Objects Help Action Recognition? Xingyi Zhou, Anurag Arnab, Chen Sun, Cordelia Schmid
TMLR 2023 PolyViT: Co-Training Vision Transformers on Images, Videos and Audio Valerii Likhosherstov, Anurag Arnab, Krzysztof Marcin Choromanski, Mario Lucic, Yi Tay, Mostafa Dehghani
ICML 2023 Scaling Vision Transformers to 22 Billion Parameters Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Peter Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme Ruiz, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd Van Steenkiste, Gamaleldin Fathy Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Collier, Alexey A. Gritsenko, Vighnesh Birodkar, Cristina Nader Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetic, Dustin Tran, Thomas Kipf, Mario Lucic, Xiaohua Zhai, Daniel Keysers, Jeremiah J. Harmsen, Neil Houlsby
CVPR 2023 Token Turing Machines Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab
ICCV 2023 UnLoc: A Unified Framework for Video Localization Tasks Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid
CVPR 2022 End-to-End Generative Pretraining for Multimodal Video Captioning Paul Hongsuck Seo, Arsha Nagrani, Anurag Arnab, Cordelia Schmid
CVPR 2022 Learning with Neighbor Consistency for Noisy Labels Ahmet Iscen, Jack Valmadre, Anurag Arnab, Cordelia Schmid
CVPR 2022 Multiview Transformers for Video Recognition Shen Yan, Xuehan Xiong, Anurag Arnab, Zhichao Lu, Mi Zhang, Chen Sun, Cordelia Schmid
CVPR 2022 Scenic: A JAX Library for Computer Vision Research and Beyond Mostafa Dehghani, Alexey Gritsenko, Anurag Arnab, Matthias Minderer, Yi Tay
ECCV 2022 Simple Open-Vocabulary Object Detection with Vision Transformers Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby
ICLR 2022 The Efficiency Misnomer Mostafa Dehghani, Yi Tay, Anurag Arnab, Lucas Beyer, Ashish Vaswani
NeurIPS 2021 Attention Bottlenecks for Multimodal Fusion Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun
NeurIPS 2021 Compressive Visual Representations Kuang-Huei Lee, Anurag Arnab, Sergio Guadarrama, John Canny, Ian Fischer
NeurIPS 2021 TokenLearner: Adaptive Space-Time Tokenization for Videos Michael Ryoo, Aj Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova
ICCV 2021 Unified Graph Structured Models for Video Understanding Anurag Arnab, Chen Sun, Cordelia Schmid
ICCV 2021 ViViT: A Video Vision Transformer Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid
ECCV 2020 Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos Anurag Arnab, Chen Sun, Arsha Nagrani, Cordelia Schmid
ACML 2018 Deep Fully-Connected Part-Based Models for Human Pose Estimation Rodrigo de Bem, Anurag Arnab, Stuart Golodetz, Michael Sapienza, Philip Torr
ECCV 2018 Weakly- and Semi-Supervised Panoptic Segmentation Qizhu Li, Anurag Arnab, Philip H.S. Torr
CVPR 2017 Pixelwise Instance Segmentation with a Dynamically Instantiated Network Anurag Arnab, Philip H. S. Torr
ECCV 2016 Higher Order Conditional Random Fields in Deep Neural Networks Anurag Arnab, Sadeep Jayasumana, Shuai Zheng, Philip H. S. Torr