Elhoseiny, Mohamed
85 publications
WACV
2025
Local Masked Reconstruction for Efficient Self-Supervised Learning on High-Resolution Images
NeurIPS
2025
MAGNET: A Multi-Agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
ICLR
2025
ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge
NeurIPS
2025
Vgent: Graph-Based Retrieval-Reasoning-Augmented Generation for Long Video Understanding
NeurIPS
2024
3DCoMPaT200: Language Grounded Large-Scale 3D Vision Dataset for Compositional Recognition
ICLR
2024
Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation
NeurIPS
2024
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding
ICCV
2023
Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only
CVPR
2023
MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding
WACV
2022
3DRefTransformer: Fine-Grained Object Identification in Real-World Scenes Using Natural Language
ECCV
2022
Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification
NeurIPS
2022
Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding
CVPR
2022
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
ECCV
2020
ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes
NeurIPS
2020
Temporal Positive-Unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation