Piergiovanni, Aj

21 publications

CVPR 2025 VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models Dahun Kim, Aj Piergiovanni, Ganesh Mallya, Anelia Angelova
CVPR 2024 Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities Aj Piergiovanni, Isaac Noble, Dahun Kim, Michael S. Ryoo, Victor Gomes, Anelia Angelova
CVPR 2024 On Scaling up a Multilingual Vision and Language Model Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, Aj Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut
ICLRW 2023 Dynamic Pretraining of Vision-Language Models Aj Piergiovanni, Weicheng Kuo, Wei Li, Anelia Angelova
TMLR 2023 MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks Weicheng Kuo, Aj Piergiovanni, Dahun Kim, Xiyang Luo, Benjamin Caine, Wei Li, Abhijit Ogale, Luowei Zhou, Andrew M. Dai, Zhifeng Chen, Claire Cui, Anelia Angelova
ICLR 2023 Open-Vocabulary Object Detection upon Frozen Vision and Language Models Weicheng Kuo, Yin Cui, Xiuye Gu, Aj Piergiovanni, Anelia Angelova
ICLR 2023 PaLI: A Jointly-Scaled Multilingual Language-Image Model Xi Chen, Xiao Wang, Soravit Changpinyo, Aj Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish V Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme Ruiz, Andreas Peter Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut
CVPR 2023 Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning Aj Piergiovanni, Weicheng Kuo, Anelia Angelova
ECCV 2022 FindIt: Generalized Localization with Natural Language Queries Weicheng Kuo, Fred Bertsch, Wei Li, Aj Piergiovanni, Mohammad Saffar, Anelia Angelova
ECCV 2022 Video Question Answering with Iterative Video-Text Co-Tokenization Aj Piergiovanni, Kairo Morton, Weicheng Kuo, Michael S. Ryoo, Anelia Angelova
ICCV 2021 4D-Net for Learned Multi-Modal Alignment Aj Piergiovanni, Vincent Casser, Michael S. Ryoo, Anelia Angelova
CVPR 2021 Recognizing Actions in Videos from Unseen Viewpoints Aj Piergiovanni, Michael S. Ryoo
NeurIPS 2021 TokenLearner: Adaptive Space-Time Tokenization for Videos Michael Ryoo, Aj Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova
NeurIPS 2020 AViD Dataset: Anonymized Videos from Diverse Countries Aj Piergiovanni, Michael Ryoo
ECCV 2020 Adversarial Generative Grammars for Human Activity Prediction Aj Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo
ECCV 2020 AssembleNet++: Assembling Modality Representations via Attention Connections - Supplementary Material - Michael S. Ryoo, Aj Piergiovanni, Juhana Kangaspunta, Anelia Angelova
ICLR 2020 AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures Michael S. Ryoo, Aj Piergiovanni, Mingxing Tan, Anelia Angelova
ECCV 2020 AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification Xiaofang Wang, Xuehan Xiong, Maxim Neumann, Aj Piergiovanni, Michael S. Ryoo, Anelia Angelova, Kris M. Kitani, Wei Hua
WACV 2020 Learning Multimodal Representations for Unseen Activities Aj Piergiovanni, Michael Ryoo
CoRL 2019 Model-Based Behavioral Cloning with Future Image Similarity Learning Alan Wu, Aj Piergiovanni, Michael S. Ryoo
ICML 2019 Temporal Gaussian Mixture Layer for Videos Aj Piergiovanni, Michael Ryoo