Tapaswi, Makarand

30 publications

CVPRW 2025 Investigating Mechanisms for In-Context Vision Language Binding Darshana Saravanan, Makarand Tapaswi, Vineet Gandhi
TMLR 2025 No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning Manu Gaur, Darshan Singh S, Makarand Tapaswi
WACV 2025 Seeing Eye to AI: Comparing Human Gaze and Model Attention in Video Memorability Prajneya Kumar, Eshika Khandelwal, Makarand Tapaswi, Vishnu Sreekumar
CVPR 2025 VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment Darshana Saravanan, Varun Gupta, Darshan Singh, Zeeshan Khan, Vineet Gandhi, Makarand Tapaswi
ICMLW 2024 Localizing Auditory Concepts in CNNs Pratyaksh Gautam, Makarand Tapaswi, Vinoo Alluri
CVPR 2024 MICap: A Unified Model for Identity-Aware Movie Descriptions Haran Raajesh, Naveen Reddy Desanur, Zeeshan Khan, Makarand Tapaswi
CVPRW 2024 NurtureNet: A Multi-Task Video-Based Approach for Newborn Anthropometry Yash Khandelwal, Mayur Arvind, Sriram Kumar, Ashish Gupta, Sachin Kumar Danisetty, Piyush Bagad, Anish Madan, Mayank Lunayach, Aditya Annavajjala, Abhishek Maiti, Sansiddh Jain, Aman Dalmia, Namrata Deka, Jerome White, Jigar Doshi, Angjoo Kanazawa, Rahul Panicker, Alpan Raval, Srinivas Rana, Makarand Tapaswi
CVPR 2024 Previously on ... from Recaps to Story Summarization Aditya Kumar Singh, Dhruv Srivastava, Makarand Tapaswi
ICLRW 2023 Do Video-Language Foundation Models Have a Sense of Time? Piyush Nitin Bagad, Makarand Tapaswi, Cees G. M. Snoek
CVPR 2023 How You Feelin'? Learning Emotions and Mental States in Movie Scenes Dhruv Srivastava, Aditya Kumar Singh, Makarand Tapaswi
CVPR 2023 Test of Time: Instilling Video-Language Models with a Sense of Time Piyush Bagad, Makarand Tapaswi, Cees G. M. Snoek
WACV 2023 Unsupervised Audio-Visual Lecture Segmentation Darshan Singh S., Anchit Gupta, C. V. Jawahar, Makarand Tapaswi
NeurIPS 2022 Grounded Video Situation Recognition Zeeshan Khan, C.V. Jawahar, Makarand Tapaswi
CoRL 2022 Instruction-Driven History-Aware Policies for Robotic Manipulations Pierre-Louis Guhur, Shizhe Chen, Ricardo Garcia Pinel, Makarand Tapaswi, Ivan Laptev, Cordelia Schmid
NeurIPS 2022 Language Conditioned Spatial Relation Reasoning for 3D Object Grounding Shizhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev
ECCV 2022 Learning from Unlabeled 3D Environments for Vision-and-Language Navigation Shizhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev
CVPR 2022 Think Global, Act Local: Dual-Scale Graph Transformer for Vision-and-Language Navigation Shizhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev
ICCV 2021 Airbert: In-Domain Pretraining for Vision-and-Language Navigation Pierre-Louis Guhur, Makarand Tapaswi, Shizhe Chen, Ivan Laptev, Cordelia Schmid
CoRL 2020 Learning Object Manipulation Skills via Approximate State Estimation from Real Videos Vladimír Petrík, Makarand Tapaswi, Ivan Laptev, Josef Sivic
ICLR 2019 Visual Reasoning by Progressive Module Networks Seung Wook Kim, Makarand Tapaswi, Sanja Fidler
ICCV 2017 Situation Recognition with Graph Neural Networks Ruiyu Li, Makarand Tapaswi, Renjie Liao, Jiaya Jia, Raquel Urtasun, Sanja Fidler
CVPR 2016 MovieQA: Understanding Stories in Movies Through Question-Answering Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urtasun, Sanja Fidler
WACV 2016 Naming TV Characters by Watching and Analyzing Dialogs Monica-Laura Haurilet, Makarand Tapaswi, Ziad Al-Halah, Rainer Stiefelhagen
CVPR 2016 Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning Ziad Al-Halah, Makarand Tapaswi, Rainer Stiefelhagen
CVPR 2015 Book2Movie: Aligning Video Scenes with Book Chapters Makarand Tapaswi, Martin Bauml, Rainer Stiefelhagen
CVPR 2014 StoryGraphs: Visualizing Character Interactions as a Timeline Makarand Tapaswi, Martin Bauml, Rainer Stiefelhagen
CVPR 2013 Semi-Supervised Learning with Constraints for Person Identification in Multimedia Data Martin Bauml, Makarand Tapaswi, Rainer Stiefelhagen
CVPR 2012 "Knock! Knock! Who Is It?" Probabilistic Person Identification in TV-Series Makarand Tapaswi, Martin Bäuml, Rainer Stiefelhagen
ECCV 2012 Fusion of Speech, Faces and Text for Person Identification in TV Broadcast Hervé Bredin, Johann Poignant, Makarand Tapaswi, Guillaume Fortier, Viet Bac Le, Thibault Napoléon, Hua Gao, Claude Barras, Sophie Rosset, Laurent Besacier, Jakob Verbeek, Georges Quénot, Frédéric Jurie, Hazim Kemal Ekenel
ECCVW 2012 Fusion of Speech, Faces and Text for Person Identification in TV Broadcast Hervé Bredin, Johann Poignant, Makarand Tapaswi, Guillaume Fortier, Viet Bac Le, Thibault Napoléon, Hua Gao, Claude Barras, Sophie Rosset, Laurent Besacier, Jakob Verbeek, Georges Quénot, Frédéric Jurie, Hazim Kemal Ekenel