WACV 2025
929 papers
@BENCH: Benchmarking Vision-Language Models for Human-Centered Assistive Technology
Xin Jiang, Junwei Zheng, Ruiping Liu, Jiahang Li, Jiaming Zhang, Sven Matthiesen, Rainer Stiefelhagen 3D Edge Sketch from Multiview Images
Yilin Zheng, Chiang-Heng Chien, Ricardo Fabbri, Benjamin Kimia 3D Part Segmentation via Geometric Aggregation of 2D Visual Features
Marco Garosi, Riccardo Tedoldi, Davide Boscaini, Massimiliano Mancini, Nicu Sebe, Fabio Poiesi 3D Shape Completion Using Multi-Resolution Spectral Encoding
Pallabjyoti Deka, Saumik Bhattacharya, Debashis Sen, Prabir Kumar Biswas A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization
Xingzhe He, Zhiwen Cao, Nick Kolkin, Lantao Yu, Kun Wan, Helge Rhodin, Ratheesh Kalarot A Generic Vehicle-to-Sensor Calibration Framework
Sumin Hu, Youngmin Yoo, Jeeseong Kim, Changsoo Lim, Doohyun Cho, Bongnam Kang A Novel Perspective for Multi-Modal Multi-Label Skin Lesion Classification
Yuan Zhang, Yutong Xie, Hu Wang, Jodie C Avery, M Louise Hull, Gustavo Carneiro A Two-Head Loss Function for Deep Average-K Classification
Camille Garcin, Maximilien Servajean, Alexis Joly, Joseph Salmon ACE: Anatomically Consistent Embeddings in Composition and Decomposition
Ziyu Zhou, Haozhe Luo, Mohammad Reza Hosseinzadeh Taher, Jiaxuan Pang, Xiaowei Ding, Michael Gotway, Jianming Liang AdaPrefix++: Integrating Adapters Prefixes and Hypernetwork for Continual Learning
Sayanta Adhikari, Dupati Srikar Chandra, P. K. Srijith, Pankaj Wasnik, Naoyuki Oneo AdQuestA: Knowledge-Guided Visual Question Answer Framework for Advertisements
Neha Choudhary, Poonam Goyal, Devashish Siwatch, Atharva Chandak, Harsh Mahajan, Varun Khurana, Yaman Kumar Advancing Weight and Channel Sparsification with Enhanced Saliency
Xinglong Sun, Maying Shen, Hongxu Yin, Lei Mao, Pavlo Molchanov, Jose M. Alvarez Adversarial Learning Based Knowledge Distillation on 3D Point Clouds
S J Sanjay, Akash J, Sreehari Rajan, Dimple A Shajahan, Charu Sharma Aerial Mirage: Unmasking Hallucinations in Large Vision Language Models
Debolena Basak, Soham Bhatt, Sahith Kanduri, Maunendra Sankar Desarkar Aggregated Attributions for Explanatory Analysis of 3D Segmentation Models
Maciej Chrabaszcz, Hubert Baniecki, Piotr Komorowski, Szymon Plotka, Przemyslaw Biecek AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning
Muhammad Awais, Ali Husain Salem Abdulla Alharthi, Amandeep Kumar, Hisham Cholakkal, Rao Muhammad Anwer All-in-One Image Compression and Restoration
Huimin Zeng, Jiacheng Li, Ziqiang Zheng, Zhiwei Xiong ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction
Silvan Weder, Francis Engelmann, Johannes L. Schönberger, Akihito Seki, Marc Pollefeys, Martin R. Oswald Anchored Diffusion for Video Face Reenactment
Idan Kligvasser, Regev Cohen, George Leifman, Ehud Rivlin, Michael Elad ANTHROPOS-V: Benchmarking the Novel Task of Crowd Volume Estimation
Luca Collorone, Stefano Darrigo, Massimiliano Pappa, Guido M. Damely di Melendugno, Giovanni Ficarra, Fabio Galasso Attribute Diffusion: Diffusion Driven Diverse Attribute Editing
Rishubh Parihar, Prasanna Balaji, Raghav Magazine, Sarthak Vora, Varun Jampani, Venkatesh Babu Radhakrishnan Automated Evaluation of Large Vision-Language Models on Self-Driving Corner Cases
Kai Chen, Yanze Li, Wenhua Zhang, Yanxin Liu, Pengxiang Li, Ruiyuan Gao, Lanqing Hong, Meng Tian, Xinhai Zhao, Zhenguo Li, Dit-Yan Yeung, Huchuan Lu, Xu Jia Automated Patient Positioning with Learned 3D Hand Gestures
Zhongpai Gao, Abhishek Sharma, Meng Zheng, Benjamin Planche, Terrence Chen, Ziyan Wu AutoProSAM: Automated Prompting SAM for 3D Multi-Organ Segmentation
Chengyin Li, Rafi Ibn Sultan, Prashant Khanduri, Yao Qiang, Chetty Indrin, Dongxiao Zhu Background-Aware Moment Detection for Video Moment Retrieval
Minjoon Jung, Youwon Jang, Seongho Choi, Joochan Kim, Jin-Hwa Kim, Byoung-Tak Zhang Bandit Based Attention Mechanism in Vision Transformers
Amartya Roy Chowdhury, Raghuram Bharadwaj Diddigi, K J Prabuchandran, Achyut Mani Tripathi Benchmarking VLMs' Reasoning About Persuasive Atypical Images
Sina Malakouti, Aysan Aghazadeh, Ashmit Khandelwal, Adriana Kovashka Beyond Grids: Exploring Elastic Input Sampling for Vision Transformers
Adam Pardyl, Grzegorz Kurzejamski, Jan Olszewski, Tomasz Trzcinski, Bartosz Zielinski BioNet and NeFF: Crop Biomass Prediction from Point Clouds to Drone Imagery
Xuesong Li, Zeeshan Hayder, Ali Zia, Connor Cassidy, Shiming Liu, Warwick Stiller, Eric Stone, Warren Conaty, Lars Petersson, Vivien Rolland BioPose: Biomechanically-Accurate 3D Pose Estimation from Monocular Videos
Farnoosh Koleini, Muhammad Usama Saleem, Pu Wang, Hongfei Xue, Ahmed Helmy, Abbey Fenwick Bit-Flip Induced Latency Attacks in Object Detection
Manojna Sistla, Yu Wen, Aamir Bader Shah, Chenpei Huang, Lening Wang, Xuqing Wu, Jiefu Chen, Miao Pan, Xin Fu BIV-Priv-Seg: Locating Private Content in Images Taken by People with Visual Impairments
Yu-Yun Tseng, Tanusree Sharma, Lotus Zhang, Abigale Stangl, Leah Findlater, Yang Wang, Danna Gurari Blind Image Deblurring with FFT-ReLU Sparsity Prior
Abdul Mohaimen Al Radi, Prothito Shovon Majumder, Md. Mosaddek Khan BroadTrack: Broadcast Camera Tracking for Soccer
Floriane Magera, Thomas Hoyoux, Olivier Barnich, Marc Van Droogenbroeck CabNIR: A Benchmark for In-Vehicle Infrared Monocular Depth Estimation
Ugo Leone Cavalcanti, Matteo Poggi, Fabio Tosi, Valerio Cambareri, Vladimir Zlokolica, Stefano Mattoccia Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding
Lingdong Kong, Xiang Xu, Jun Cen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu CamoFA: A Learnable Fourier-Based Augmentation for Camouflage Segmentation
Minh-Quan Le, Minh-Triet Tran, Trung-Nghia Le, Tam V. Nguyen, Thanh-Toan Do CAMS: Convolution and Attention-Free Mamba-Based Cardiac Image Segmentation
Abbas Khan, Muhammad Asad, Martin Benning, Caroline Roney, Gregory Slabaugh Can Adversarial Examples Be Parsed to Reveal Victim Model Information?
Yuguang Yao, Jiancheng Liu, Yifan Gong, Xiaoming Liu, Yanzhi Wang, Xue Lin, Sijia Liu Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
Shuo Chen, Zhen Han, Bailan He, Jianzhe Liu, Mark Buckley, Yao Qin, Philip Torr, Volker Tresp, Jindong Gu Cap2Aug: Caption Guided Image Data Augmentation
Aniket Roy, Anshul Shah, Ketul Shah, Anirban Roy, Rama Chellappa CardioSyntax: End-to-End SYNTAX Score Prediction - Dataset Benchmark and Method
Alexander Ponomarchuk, Ivan Kruzhilov, Gleb Mazanov, Ruslan Utegenov, Artem Shadrin, Galina Zubkova, Ivan Bessonov, Pavel Blinov ChromaDistill : Colorizing Monochrome Radiance Fields with Knowledge Distillation
Ankit Dhiman, Srinath R, Srinjay Sarkar, Lokesh Boregowda, Venkatesh Babu Radhakrishnan CL-Cross VQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering
Yao Zhang, Haokun Chen, Ahmed Frikha, Denis Krompass, Gengyuan Zhang, Jindong Gu, Volker Tresp Click&Describe: Multimodal Grounding and Tracking for Aerial Objects
Rupanjali Kukal, Jay Patravali, Fuxun Yu, Simranjit Singh, Nikolaos Karianakis, Rishi Madhok CLIP-Fusion: A Spatio-Temporal Quality Metric for Frame Interpolation
Goksel Mert Çökmez, Yang Zhang, Christopher Schroers, Tunç Ozan Aydin CLIPArTT: Adaptation of CLIP to New Domains at Test Time
Gustavo A Vargas Hakim, David Osowiechi, Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Moslem Yazdanpanah, Ismail Ben Ayed, Christian Desrosiers CLIPping Imbalances: A Novel Evaluation Baseline and PEARL Dataset for Pedestrian Attribute Recognition
Kamalakar Vijay, Lalit Lohani, Kamakshya Prasad Nayak, Debi Prosad Dogra, Heeseung Choi, Hyungjoo Jung, Ig-Jae Kim ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces
Yusuke Akamatsu, Terumi Umematsu, Hitoshi Imaoka, Shizuko Gomi, Hideo Tsurushima Comparative Knowledge Distillation
Alex Tianyi Xu, Alex Wilf, Paul Pu Liang, Alexander Obolenskiy, Daniel Fried, Louis-Philippe Morency Composed Image Retrieval for Training-Free Domain Conversion
Nikos Efthymiadis, Bill Psomas, Zakaria Laskar, Konstantinos Karantzalos, Yannis Avrithis, Ondrej Chum, Giorgos Tolias Compositional Segmentation of Cardiac Images Leveraging Metadata
Abbas Khan, Muhammad Asad, Martin Benning, Caroline Roney, Gregory Slabaugh Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement
Vamsi Krishna S Vasa, Peijie Qiu, Wenhui Zhu, Yujian Xiong, Oana Dumitrascu, Yalin Wang ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising
Ashutosh Chaubey, Anoubhav Agarwaal, Sartaki Sinha Roy, Aayush Agrawal, Susmita Ghose Copy or Not? Reference-Based Face Image Restoration with Fine Details
Min Jin Chong, Dejia Xu, Yi Zhang, Zhangyang Wang, David Forsyth, Gurunandan Krishnan, Yicheng Wu, Jian Wang Corgi: Cached Memory Guided Video Generation
Xindi Wu, Uriel Singer, Zhaojiang Lin, Andrea Madotto, Xide Xia, Yifan Xu, Paul Crook, Xin Luna Dong, Seungwhan Moon Covariance-Based Space Regularization for Few-Shot Class Incremental Learning
Yijie Hu, Guanyu Yang, Zhaorui Tan, Xiaowei Wang, Kaizhu Huang, Qiu-Feng Wang CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving
Hidehisa Arai, Keita Miwa, Kento Sasaki, Kohei Watanabe, Yu Yamaguchi, Shunsuke Aoki, Issei Yamamoto CRAFT: Designing Creative and Functional 3D Objects
Michelle Guo, Mia Tang, Hannah Cha, Ruohan Zhang, C. Karen Liu, Jiajun Wu Cross-Domain and Cross-Dimension Learning for Image-to-Graph Transformers
Alexander H. Berger, Laurin Lux, Suprosanna Shit, Ivan Ezhof, Georgios Kaissis, Martin J. Menten, Daniel Rueckert, Johannes C. Paetzold CT to PET Translation: A Large-Scale Dataset and Domain-Knowledge-Guided Diffusion Approach
Dac Thai Nguyen, Trung Thanh Nguyen, Huu Tien Nguyen, Thanh Trung Nguyen, Huy Hieu Pham, Thanh Hung Nguyen, Thao Nguyen Truong, Phi Le Nguyen CTIP: Towards Accurate Tabular-to-Image Generation for Tire Footprint Generation
Daeyoung Roh, Donghee Han, Jihyun Nam, Jungsoo Oh, Youngbin You, Jeongheon Park, Mun Yi CUNSB-RFIE: Context-Aware Unpaired Neural Schrodinger Bridge in Retinal Fundus Image Enhancement
Xuanzhao Dong, Vamsi Krishna Vasa, Wenhui Zhu, Peijie Qiu, Xiwen Chen, Yi Su, Yujian Xiong, Zhangsihao Yang, Yanxi Chen, Yalin Wang D-LUT: Photorealistic Style Transfer via Diffusion Process
Mujing Li, Guanjie Wang, Xingguang Zhang, Qifeng Liao, Chenxi Xiao D2FP: Learning Implicit Prior for Human Parsing
Junyoung Hong, Hyeri Yang, Ye Ju Kim, Haerim Kim, Shinwoong Kim, Euna Shim, Kyungjae Lee DAM: Dynamic Adapter Merging for Continual Video QA Learning
Feng Cheng, Ziyang Wang, Yi-Lin Sung, Yan-Bo Lin, Mohit Bansal, Gedas Bertasius DarSwin-UNet: Distortion Aware Architecture
Akshaya Athwale, Ichrak Shili, Émile Bergeron, Ola Ahmad, Jean-Francois Lalonde DASC-SPT: Towards Self-Supervised Panoramic Semantic Segmentation
Tianlong Tan, Bin Chen, Hongliang Cao, Chenggang Yan, Yike Ma, Feng Dai Data Augmentation for Image Classification Using Generative AI
Fazle Rahat, M Shifat Hossain, Md Rubel Ahmed, Sumit Kumar Jha, Rickard Ewetz Data Augmentation for Surgical Scene Segmentation with Anatomy-Aware Diffusion Models
Danush Kumar Venkatesh, Dominik Rivoir, Micha Pfeiffer, Fiona Kolbinger, Stefanie Speidel Dataset Augmentation by Mixing Visual Concepts
Md Abdullah Al Rahat Kutubi, Hemanth Venkateswara DDS: Decoupled Dynamic Scene-Graph Generation Network
A S M Iftekhar, Raphael Ruschel, Satish Kumar, Suya You, B. S. Manjunath Debiasify: Self-Distillation for Unsupervised Bias Mitigation
Nourhan Bayasi, Jamil Fayyad, Ghassan Hamarneh, Rafeef Garbi, Homayoun Najjaran Decomposed Distribution Matching in Dataset Condensation
Sahar Rahimi Malakshan, Mohammad Saeed Ebrahimi Saadabadi, Ali Dabouei, Nasser Nasrabadi Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation
Utkarsh Nath, Rajeev Goel, Eun Som Jeon, Changhoon Kim, Kyle Min, Yezhou Yang, Yingzhen Yang, Pavan Turaga DeepMIM: Deep Supervision for Masked Image Modeling
Sucheng Ren, Fangyun Wei, Samuel Albanie, Zheng Zhang, Han Hu Denoising Diffusion Models for High-Resolution Microscopy Image Restoration
Pamela Osuna-Vargas, Maren H. Wehrheim, Lucas Zinz, Johanna Rahm, Ashwin Balakrishnan, Alexandra Kaminer, Mike Heilemann, Matthias Kaschube Dense Depth from Event Focal Stack
Kenta Horikawa, Mariko Isogawa, Hideo Saito, Shohei Mori Design-O-Meter: Towards Evaluating and Refining Graphic Designs
Sahil Goyal, Abhinav Mahajan, Swasti Mishra, Prateksha Udhayanan, Tripti Shukla, Kj Joseph, Balaji Vasan Srinivasan Detecting Wildfires on UAVs with Real-Time Segmentation Trained by Larger Teacher Models
Julius Pesonen, Teemu Hakala, Väinö Karjalainen, Niko Koivumäki, Lauri Markelin, Anna-Maria Raita-Hakola, Juha Suomalainen, Ilkka Pölönen, Eija Honkavaara DiffPAD: Denoising Diffusion-Based Adversarial Patch Decontamination
Jia Fu, Xiao Zhang, Sepideh Pashami, Fatemeh Rahimian, Anders Holst DiffuPT: Class Imbalance Mitigation for Glaucoma Detection via Diffusion Based Generation and Model Pretraining
Youssof Nawar, Nouran Soliman, Moustafa Wassel, Mohamed ElHabebe, Noha Adly, Marwan Torki, Ahmed Elmassry, Islam Ahmed Diffusion-Based Particle-DETR for BEV Perception
Asen Nachkov, Danda Pani Paudel, Martin Danelljan, Luc Van Gool Diffusion-Based Visual Anagram as Multi-Task Learning
Zhiyuan Xu, Yinhe Chen, Huan-ang Gao, Weiyan Zhao, Guiyu Zhang, Hao Zhao DiL: An Explainable and Practical Metric for Abnormal Uncertainty in Object Detection
Amit Giloni, Omer Hofman, Ikuya Morikawa, Toshiya Shimizu, Yuval Elovici, Asaf Shabtai DisCo: Discovering Common Affordance from Large Models for Actionable Part Perception
Youpeng Wen, Yi Zhu, Zhihao Zhan, Pengzhen Ren, Jianhua Han, Hang Xu, Shen Zhao, Xiaodan Liang Distillation of Diffusion Features for Semantic Correspondence
Frank Fundel, Johannes Schusterbauer, Vincent Tao Hu, Björn Ommer DivAvatar: Diverse 3D Avatar Generation with a Single Prompt
Weijing Tao, Biwen Lei, Kunhao Liu, Shijian Lu, Miaomiao Cui, Xuansong Xie DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID
Nyle Siddiqui, Florinel Alin Croitoru, Gaurav Kumar Nayak, Radu Tudor Ionescu, Mubarak Shah DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification
Minghui Lin, Shu Wang, Xiang Wang, Jianhua Tang, Longbin Fu, Zhengrong Zuo, Nong Sang DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing
Matias Turkulainen, Xuqian Ren, Iaroslav Melekhov, Otto Seiskari, Esa Rahtu, Juho Kannala Domain-Guided Weight Modulation for Semi-Supervised Domain Generalization
Chamuditha Jayanga Galappaththige, Zachary Izzo, Xilin He, Honglu Zhou, Muhammad Haris Khan DreamBlend: Advancing Personalized Fine-Tuning of Text-to-Image Diffusion Models
Shwetha Ram, Tal Neiman, Qianli Feng, Andrew M Stuart, Son Tran, Trishul A Chilimbi DreaMo: Articulated 3D Reconstruction from a Single Casual Video
Tao Tu, Ming-Feng Li, Chieh Hubert Lin, Yen-Chi Cheng, Min Sun, Ming-Hsuan Yang Dynamic Adapter Tuning for Long-Tailed Class-Incremental Learning
Yanan Gu, Muli Yang, Xu Yang, Kun Wei, Hongyuan Zhu, Gabriel James Goenawan, Cheng Deng Dynamic Attention-Guided Diffusion for Image Super-Resolution
Brian B. Moser, Stanislav Frolov, Federico Raue, Sebastian Palacio, Andreas Dengel EdgeGaussians - 3D Edge Mapping via Gaussian Splatting
Kunal Chelani, Assia Benbihi, Torsten Sattler, Fredrik Kahl EDMB: Edge Detector with Mamba
Yachuan Li, Xavier Soria Poma, Yun Bai, Qian Xiao, Chaozhi Yang, Guanlin Li, Zongmin Li Effective Backdoor Learning on Open-Set Face Recognition Systems
Diana Voth, Leonidas Dane, Jonas Grebe, Sebastian Peitz, Philipp Terhörst Efficient Progressive Image Compression with Variance-Aware Masking
Alberto Presta, Enzo Tartaglione, Attilio Fiandrotti, Marco Grangetto, Pamela Cosman Efficient Video Object Segmentation via Modulated Cross-Attention Memory
Abdelrahman Shaker, Syed Talal Wasim, Martin Danelljan, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan EfficientCrackNet: A Lightweight Model for Crack Segmentation
Abid Hasan Zim, Aquib Iqbal, Zaid Al-Huda, Asad Malik, Minoru Kuribayashi EgoCast: Forecasting Egocentric Human Pose in the Wild
Maria Escobar, Juanita Puentes, Cristhian Forigua, Jordi Pont-Tuset, Kevis-Kokitsi Maninis, Pablo Arbelaez EgoPoints: Advancing Point Tracking for Egocentric Videos
Ahmad Darkhalil, Rhodri Guerrier, Adam W. Harley, Dima Damen Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models
Rohit Jena, Ali Taghibakhshi, Sahil Jain, Gerald Shen, Nima Tajbakhsh, Arash Vahdat EmoVOCA: Speech-Driven Emotional 3D Talking Heads
Federico Nocentini, Claudio Ferrari, Stefano Berretti Endoscopic Scoring and Localization in Unconstrained Clinical Trial Videos
Jinlin Xiang, Hillol Sarker, Bozhao Qi, Ruisu Zhang, Roger Trullo, Salvatore Badalamenti, Maria Wiekowski, Annie Kruger, Etienne Pochet, Qi Tang, Wei Zhao Enhancing Embodied Object Detection with Spatial Feature Memory
Nicolas Harvey Chapman, Christopher Lehnert, Will Browne, Feras Dayoub Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks
Alessio Quercia, Erenus Yildiz, Zhuo Cao, Kai Krajsek, Abigail Morrison, Ira Assent, Hanno Scharr Enhancing Predictive Imaging Biomarker Discovery Through Treatment Effect Analysis
Shuhan Xiao, Lukas Klein, Jens Petersen, Philipp Vollmuth, Paul F. Jaeger, Klaus H. Maier-Hein Enhancing Visual Classification Using Comparative Descriptors
Hankyeol Lee, Gawon Seo, Wonseok Choi, Geunyoung Jung, Kyungwoo Song, Jiyoung Jung ERM++: An Improved Baseline for Domain Generalization
Piotr Teterwak, Kuniaki Saito, Theodoros Tsiligkaridis, Kate Saenko, Bryan Plummer EvoCL: Continual Learning over Evolving Domains
Vishnuprasadh Kumaravelu, P.K. Srijith, Sunil Gupta Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato Exploiting Inter-Sample Information for Long-Tailed Out-of-Distribution Detection
Nimeshika Udayangani, Hadi Mohaghegh Dolatabadi, Sarah Erfani, Christopher Leckie Face Anonymization Made Simple
Han-Wei Kung, Tuomas Varanka, Sanjay Saha, Terence Sim, Nicu Sebe Fairer Analysis and Demographically Balanced Face Generation for Fairer Face Verification
Alexandre Fournier-Montgieux, Michaël Soumm, Adrian Popescu, Bertrand Luvison, Hervé Le Borgne FASTER: A Font-Agnostic Scene Text Editing and Rendering Framework
Alloy Das, Sanket Biswas, Prasun Roy, Subhankar Ghosh, Umapada Pal, Michael Blumenstein, Josep Lladós, Saumik Bhattacharya FaVoR: Features via Voxel Rendering for Camera Relocalization
Vincenzo Polizzi, Marco Cannici, Davide Scaramuzza, Jonathan Kelly FDS: Feedback-Guided Domain Synthesis with Multi-Source Conditional Diffusion Models for Domain Generalization
Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Gustavo A Vargas Hakim, David Osowiechi, Moslem Yazdanpanah, Ismail Ben Ayed, Christian Desrosiers Feasibility of Federated Learning from Client Databases with Different Brain Diseases and MRI Modalities
Felix Wagner, Wentian Xu, Pramit Saha, Ziyun Liang, Daniel Whitehouse, David Menon, Virginia Newcombe, Natalie Voets, J. Alison Noble, Konstantinos Kamnitsas Feature Augmentation Based Test-Time Adaptation
Younggeol Cho, Youngrae Kim, Junho Yoon, Seunghoon Hong, Dongman Lee Federated Voxel Scene Graph for Intracranial Hemorrhage
Antoine P. Sanner, Jonathan Stieber, Nils F. Grauhan, Suam Kim, Marc A. Brockmann, Ahmed E. Othman, Anirban Mukhopadhyay Federated-Continual Dynamic Segmentation of Histopathology Guided by Barlow Continuity
Niklas Babendererde, Haozhe Zhu, Moritz Fuchs, Jonathan Stieber, Anirban Mukhopadhyay Fine-Grained Controllable Video Generation via Object Appearance and Context
Hsin-Ping Huang, Yu-Chuan Su, Deqing Sun, Lu Jiang, Xuhui Jia, Yukun Zhu, Ming-Hsuan Yang Fine-Tuning Image-Conditional Diffusion Models Is Easier than You Think
Gonzalo Martin Garcia, Karim Abou Zeid, Christian Schmidt, Daan de Geus, Alexander Hermans, Bastian Leibe Flowering Time Prediction of Wheat from DIA-MS Data
Yan Yang, Utpal Bose, James Broadbent, Sally Stockwell, Keren A Byrne, Md Zakir Hossain, Eric A Stone, Shannon Dillon GANESH: Generalizable NeRF for Lensless Imaging
Rakesh Raj Madhavan, Akshat Kaimal, Badhrinarayanan K.V, Vinayak Gupta, Rohit Choudhary, Chandrakala Shanmuganathan, Kaushik Mitra GANFusion: Feed-Forward Text-to-3D with Diffusion in GAN Space
Souhaib Attaiki, Paul Guerrero, Duygu Ceylan, Niloy Mitra, Maks Ovsjanikov GauFRe: Gaussian Deformation Fields for Real-Time Dynamic Novel View Synthesis
Yiqing Liang, Numair Khan, Zhengqin Li, Thu H Nguyen-Phuoc, Douglas Lanman, James Tompkin, Lei Xiao GazeSearch: Radiology Findings Search Benchmark
Trong Thang Pham, Tien-Phat Nguyen, Yuki Ikebe, Akash Awasthi, Zhigang Deng, Carol C. Wu, Hien Nguyen, Ngan Le Generalist YOLO: Towards Real-Time End-to-End Multi-Task Visual Language Models
Hung-Shuo Chang, Chien-Yao Wang, Richard Robert Wang, Gene Chou, Hong-Yuan Mark Liao Generalizable Single-Source Cross-Modality Medical Image Segmentation via Invariant Causal Mechanisms
Boqi Chen, Yuanzhi Zhu, Yunke Ao, Sebastiano Caprara, Reto Sutter, Gunnar Rätsch, Ender Konukoglu, Anna Susmelj GeoDiffuser: Geometry-Based Image Editing with Diffusion Models
Rahul Sajnani, Jeroen Vanbaar, Jie Min, Kapil D Katyal, Srinath Sridhar GeoGuide: Geometric Guidance of Diffusion Models
Mateusz Poleski, Jacek Tabor, Przemyslaw Spurek GET-UP: GEomeTric-Aware Depth Estimation with Radar Points UPsampling
Huawei Sun, Zixu Wang, Hao Feng, Julius Ott, Lorenzo Servadei, Robert Wille GHOST: Grounded Human Motion Generation with Open Vocabulary Scene-and-Text Contexts
Zoltán Á. Milacski, Koichiro Niinuma, Ryosuke Kawamura, Fernando de la Torre, László A. Jeni Global-Guided Focal Neural Radiance Field for Large-Scale Scene Rendering
Mingqi Shao, Feng Xiong, Hang Zhang, Shuang Yang, Mu Xu, Wei Bian, Xueqian Wang GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification
Souhail Bakkali, Sanket Biswas, Zuheng Ming, Mickaël Coustaty, Marçal Rusiñol, Oriol Ramos Terrades, Josep Lladós GroundingMate: Aiding Object Grounding for Goal-Oriented Vision-and-Language Navigation
Qianyi Liu, Siqi Zhang, Yanyuan Qiao, Junyou Zhu, Xiang Li, Longteng Guo, Qunbo Wang, Xingjian He, Qi Wu, Jing Liu GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction
Hrishav Bakul Barua, Kalin Stefanov, KokSheik Wong, Abhinav Dhall, Ganesh Krishnasamy Guardian of the Ensembles: Introducing Pairwise Adversarially Robust Loss for Resisting Adversarial Attacks in DNN Ensembles
Shubhi Shukla, Subhadeep Dalui, Manaar Alam, Shubhajit Datta, Arijit Mondal, Debdeep Mukhopadhyay, Partha Pratim Chakrabarti Guess Future Anomalies from Normalcy: Forecasting Abnormal Behavior in Real-World Videos
Snehashis Majhi, Mohammed Guermal, Antitza Dantcheva, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, François Brémond HEX: Hierarchical Emergence Exploitation in Self-Supervised Algorithms
Kiran Kokilepersaud, Seulgi Kim, Mohit Prabhushankar, Ghassan AlRegib HexaGen3D: StableDiffusion Is One Step Away from Fast and Diverse Text-to-3D Generation
Antoine Mercier, Ramin Nakhli, Mahesh Reddy, Rajeev Yasarla, Hong Cai, Fatih Porikli, Guillaume Berger Image Adaptation for Colour Vision Deficient Viewers Using Vision Transformers
Thomas Gillooly, Jean-Baptiste Thomas, Jon Y. Hardeberg, Giuseppe Claudio Guarnera Image-Caption Encoding for Improving Zero-Shot Generalization
Eric Yu, Christopher Liao, Sathvik Ravi, Theodoros Tsiligkaridis, Brian Kulis Improving Faithfulness of Text-to-Image Diffusion Models Through Inference Intervention
Danfeng Guo, Sanchit Agarwal, Yu-Hsiang Lin, Jiun-Yu Kao, Tagyoung Chung, Nanyun Peng, Mohit Bansal Improving Uncertainty Estimation with Confidence-Aware Training Data
Sergey Korchagin, Ekaterina Zaychenkova, Aleksei Khalin, Aleksandr Yugay, Alexey Zaytsev, Egor Ershov InDistill: Information Flow-Preserving Knowledge Distillation for Model Compression
Ioannis Sarridis, Christos Koutlis, Giorgos Kordopatis-Zilos, Yiannis Kompatsiaris, Symeon Papadopoulos Infant Action Generative Modeling
Xiaofei Huang, Elaheh Hatamimajoumerd, Amal Mathew, Sarah Ostadabbas Information Theoretic Pruning of Coupled Channels in Deep Neural Networks
Peyman Rostami, Nilotpal Sinha, Nidhaleddine Chenni, Anis Kacem, Abd El Rahman Shabayek, Carl Shneider, Djamila Aouada Instructive3D: Editing Large Reconstruction Models with Text Instructions
Kunal Kathare, Ankit Dhiman, K Vikas Gowda, Siddharth Aravindan, Shubham Monga, Basavaraja Shanthappa Vandrotti, Lokesh R Boregowda InvisMark: Invisible and Robust Watermarking for AI-Generated Image Provenance
Rui Xu, Mengya Hu, Deren Lei, Yaxi Li, David Lowe, Alex Gorevski, Mingyu Wang, Emily Ching, Alex Deng IRIS-VIS: A New Dataset for Visibility Estimation in an Industrial Environment
Flavien Armangeon, Thibaud Ehret, Enric Meinhardt-Llopis, Rafael Grompone von Gioi, Guillaume Thibault, Marc Petit, Gabriele Facciolo KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder
Maheswar Bora, Saurabh Atreya, Aritra Mukherjee, Abhijit Das Label Augmented Dataset Distillation
Seoungyoon Kang, Youngsun Lim, Hyunjung Shim Label Calibration in Source Free Domain Adaptation
Shivangi Rai, Rini Smita Thakur, Kunal Jangid, Vinod K Kurmi Language-Guided Instance-Aware Domain-Adaptive Panoptic Segmentation
Elham Amin Mansour, Ozan Unal, Suman Saha, Benjamin Bejar, Luc Van Gool LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts
Anh-Quan Cao, Maximilian Jaritz, Matthieu Guillaumin, Raoul de Charette, Loris Bazzani Learning Anatomy-Disease Entangled Representation
Fatemeh Haghighi, Michael B. Gotway, Jianming Liang Learning the Power of "No": Foundation Models with Negations
Jaisidh Singh, Ishaan Shrivastava, Mayank Vatsa, Richa Singh, Aparna Bharati Leveraging Vision Language Models for Specialized Agricultural Tasks
Muhammad Arbab Arshad, Talukder Zaki Jubery, Tirtho Roy, Rim Nassiri, Asheesh K. Singh, Arti Singh, Chinmay Hegde, Baskar Ganapathysubramanian, Aditya Balu, Adarsh Krishnamurthy, Soumik Sarkar Local Masked Reconstruction for Efficient Self-Supervised Learning on High-Resolution Images
Jun Chen, Faizan Farooq Khan, Ming Hu, Ammar Sherif, Zongyuan Ge, Boyang Li, Mohamed Elhoseiny Localized Gaussian Splatting Editing with Contextual Awareness
Hanyuan Xiao, Yingshu Chen, Huajian Huang, Haolin Xiong, Jing Yang, Pratusha Prasad, Yajie Zhao Long-Term Ad Memorability: Understanding & Generating Memorable Ads
Harini Si, Somesh Singh, Yaman Kumar Singla, Aanisha Bhattacharyya, Veeky Baths, Changyou Chen, Rajiv Ratn Shah, Balaji Krishnamurthy Looking at Model Debiasing Through the Lens of Anomaly Detection
Vito Paolo Pastore, Massimiliano Ciranni, Davide Marinelli, Francesca Odone, Vittorio Murino Loose Social-Interaction Recognition in Real-World Therapy Scenarios
Abid Ali, Rui Dai, Ashish Marisetty, Guillaume Astruc, Monique Thonnat, Jean-Marc Odobez, Susanne Thummler, Francois Bremond LumiGauss: Relightable Gaussian Splatting in the Wild
Joanna Kaleta, Kacper Kania, Tomasz Trzcinski, Marek Kowalski MagicStick: Controllable Video Editing via Control Handle Transformations
Yue Ma, Xiaodong Cun, Sen Liang, Jinbo Xing, Yingqing He, Chenyang Qi, Siran Chen, Qifeng Chen MAGMA: Manifold Regularization for MAEs
Alin-Eugen Dondera, Anuj R Singh, Hadi Jamali-Rad MAISI: Medical AI for Synthetic Imaging
Pengfei Guo, Can Zhao, Dong Yang, Ziyue Xu, Vishwesh Nath, Yucheng Tang, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu Make-a-Texture: Fast Shape-Aware 3D Texture Generation in 3 Seconds
Liat Sless Gorelik, Yuchen Fan, Omri Armstrong, Forrest N Iandola, Yilei Li, Ita Lifshitz, Rakesh Ranjan Mamba-ST: State Space Model for Efficient Style Transfer
Filippo Botti, Alex Ergasti, Leonardo Rossi, Tomaso Fontanini, Claudio Ferrari, Massimo Bertozzi, Andrea Prati MaskVD: Region Masking for Efficient Video Object Detection
Sreetama Sarkar, Gourav Datta, Souvik Kundu, Kai Zheng, Chirayata Bhattacharyya, Peter A. Beerel Meta-Learning for Color-to-Infrared Cross-Modal Style Transfer
Evelyn A. Stump, Francesco Luzi, Leslie M. Collins, Jordan M. Malof Mind the Prompt: A Novel Benchmark for Prompt-Based Class-Agnostic Counting
Luca Ciampi, Nicola Messina, Matteo Pierucci, Giuseppe Amato, Marco Avvenuti, Fabrizio Falchi Mixed Patch Visible-Infrared Modality Agnostic Object Detection
Heitor R. Medeiros, David Latortue, Eric Granger, Marco Pedersoli MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning
Jianyi Zhang, Hao Yang, Ang Li, Xin Guo, Pu Wang, Haiming Wang, Yiran Chen, Hai Li MLLM-Tool: A Multimodal Large Language Model for Tool Agent Learning
Chenyu Wang, Weixin Luo, Sixun Dong, Xiaohua Xuan, Zhengxin Li, Lin Ma, Shenghua Gao MoRAG - Multi-Fusion Retrieval Augmented Generation for Human Motion
Sai Shashank Kalakonda, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla MRI Reconstruction with Regularized 3D Diffusion Model (R3DM)
Arya Bangun, Zhuo Cao, Alessio Quercia, Hanno Scharr, Elisabeth Pfaehler Multi-Label Continual Learning for the Medical Domain: A Novel Benchmark
Marina Ceccon, Davide Dalle Pezze, Alessandro Fabris, Gian Antonio Susto Multi-Modal Large Language Model with RAG Strategies in Soccer Commentary Generation
Xiang Li, Yangfan He, Shuaishuai Zu, Zhengyang Li, Tianyu Shi, Yiting Xie, Kevin Zhang Multi-Modal Large Language Models Are Effective Vision Learners
Li Sun, Chaitanya Ahuja, Peng Chen, Matt D'Zmura, Kayhan Batmanghelich, Philip Bontrager Multi-Resolution Guided 3D GANs for Medical Image Translation
Juhyung Ha, Jong Sung Park, David Crandall, Eleftherios Garyfallidis, Xuhong Zhang Multi-Spectral Image Color Reproduction
Jiacheng Li, Chang Chen, Xue Hu, Fenglong Song, Youliang Yan, Zhiwei Xiong Multimodal Fusion Learning with Dual Attention for Medical Imaging
Joy Dhar, Nayyar Zaidi, Maryam Haghighat, Sudipta Roy, Puneet Goyal, Azadeh Alavi, Vikas Kumar MVAD: A Multiple Visual Artifact Detector for Video Streaming
Chen Feng, Duolikun Danier, Fan Zhang, Alex Mackin, Andrew Collins, David Bull My3DGen: A Scalable Personalized 3D Generative Model
Luchao Qi, Jiaye Wu, Annie N. Wang, Shengze Wang, Roni Sengupta Neural SDF for Shadow-Aware Unsupervised Structured Light
Kazuto Ichimaru, Diego Thomas, Takafumi Iwaguchi, Hiroshi Kawasaki No Annotations for Object Detection in Art Through Stable Diffusion
Patrick Ramos, Nicolas Gonthier, Selina Khan, Yuta Nakashima, Noa Garcia Noise-Aware Evaluation of Object Detectors
Jeffri Murrugarra Llerena, Claudio R. Jung Now You See Me: Context-Aware Automatic Audio Description
Seon-Ho Lee, Jue Wang, David Fan, Zhikang Zhang, Linda Liu, Xiang Hao, Vimal Bhat, Xinyu Li OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics
Yoni Gozlan, Antoine Falisse, Scott Uhlrich, Anthony Gatti, Michael Black, Jennifer Hicks, Scott Delp, Akshay Chaudhari OpenCity3D: What Do Vision-Language Models Know About Urban Environments?
Valentin Bieri, Marco Zamboni, Nicolas Samuel Blumer, Qingxuan Chen, Francis Engelmann Optimizing Neural Network Effectiveness via Non-Monotonicity Refinement
Koushik Biswas, Amit Reza, Meghana Karri, Debesh Jha, Hongyi Pan, Nikhil Tomar, Aliza Subedi, Smriti Regmi, Ulas Bagci ORFormer: Occlusion-Robust Transformer for Accurate Facial Landmark Detection
Jui-Che Chiang, Hou-Ning Hu, Bo-Syuan Hou, Chia-Yu Tseng, Yu-Lun Liu, Min-Hung Chen, Yen-Yu Lin Oriented Cell Dataset: A Dataset and Benchmark for Oriented Cell Detection and Applications
Lucas Kirsten, Angelo Angonezi, Jose Marques, Fernanda Oliveira, Juliano Faccioni, Camila Cassel, Débora de Sousa, Samlai Vedovatto, Guido Lenz, Claudio Jung PALO: A Polyglot Large Multimodal Model for 5b People
Hanoona Rasheed, Muhammad Maaz, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Tim Baldwin, Michael Felsberg, Fahad S. Khan Perceive Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries
Roberto Amoroso, Gengyuan Zhang, Rajat Koner, Lorenzo Baraldi, Rita Cucchiara, Volker Tresp PETALface: Parameter Efficient Transfer Learning for Low-Resolution Face Recognition
Kartik Narayan, Nithin Gopalakrishnan Nair, Jennifer Xu, Rama Chellappa, Vishal M. Patel PGRID: Power Grid Reconstruction in Informal Developments Using High-Resolution Aerial Imagery
Simone Fobi Nsutezo, Amrita Gupta, Duncan Kebut, Seema Iyer, Luana Marotti, Rahul Dodhia, Juan M. Lavista Ferres, Anthony Ortiz Phaseformer: Phase-Based Attention Mechanism for Underwater Image Restoration and Beyond
Raqib Khan, Anshul Negi, Ashutosh Kulkarni, Shruti S. Phutke, Santosh Kumar Vipparthi, Subrahmanyam Murala Physiology-Aware PolySnake for Coronary Vessel Segmentation
Yizhe Ruan, Lin Gu, Yusuke Kurose, Junichi Iho, Youji Tokunaga, Makoto Horie, Yusaku Hayashi, Keisuke Nishizawa, Yasushi Koyama, Tatsuya Harada Planar Gaussian Splatting
Farhad G. Zanjani, Hong Cai, Hanno Ackermann, Leila Mirvakhabova, Fatih Porikli PocoLoco: A Point Cloud Diffusion Model of Human Shape in Loose Clothing
Siddharth Seth, Rishabh Dabral, Diogo C Luvizon, Marc Habermann, Ming-Hsuan Yang, Christian Theobalt, Adam Kortylewski Pre-Capture Privacy via Adaptive Single-Pixel Imaging
Yoko Sogabe, Shiori Sugimoto, Ayumi Matsumoto, Masaki Kitahara Predicting Event Memorability Using Personalized Federated Learning
Sourasekhar Banerjee, Debaditya Roy, Vigneshwaran Subbaraju, Monowar Bhuyan PRoGS: Progressive Rendering of Gaussian Splats
Brent Zoomers, Maarten Wijnants, Ivan Molenaers, Joni Vanherck, Jeroen Put, Lode Jorissen, Nick Michiels PTQ4VM: Post-Training Quantization for Visual Mamba
Younghyun Cho, Changhun Lee, Seonggon Kim, Eunhyeok Park PULSE: Physiological Understanding with Liquid Signal Extraction
Shahzad Ahmad, Sania Bano, Sachin Verma, Yogesh Singh Rawat, Sukalpa Chanda, Santosh Kumar Vipparthi, Subrahmanyam Murala Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models
Sanoojan Baliah, Qinliang Lin, Shengcai Liao, Xiaodan Liang, Muhammad Haris Khan ReBotNet: Fast Real-Time Video Enhancement
Jeya Maria Jose Valanarasu, Rahul Garg, Andeep Toor, Xin Tong, Weijuan Xi, Andreas Lugmayr, Vishal M. Patel, Anne Menini Recognizing Unseen States of Unknown Objects by Leveraging Knowledge Graphs
Filippos Gouidis, Konstantinos Papoutsakis, Theodore Patkos, Antonis Argyros, Dimitris Plexousakis Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach
Wenjun Huang, Yang Ni, Arghavan Rezvani Dehaghani, SungHeon Evan Jeong, Hanning Chen, Yezi Liu, Fei Wen, Mohsen Imani Recurrence-Based Vanishing Point Detection
Skanda Bharadwaj, Robert T. Collins, Yanxi Liu Reducing the Content Bias for AI-Generated Image Detection
Seoyeon Gye, Junwon Ko, Hyounguk Shon, Minchan Kwon, Junmo Kim ReEdit: Multimodal Exemplar-Based Image Editing
Ashutosh Srivastava, Tarun Ram Menta, Abhinav Java, Avadhoot Gorakh Jadhav, Silky Singh, Surgan Jandial, Balaji Krishnamurthy Remote Blood Pressure Estimation from Facial Videos Using Transfer Learning: Leveraging PPG to rPPG Conversion
Chun-Hong Cheng, Jing Wei Chin, Kwan Long Wong, Tsz Tai Chan, Hau Ching Lo, Kwan Lok Pang, Richard So, Bryan Yan RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation
Henrique Piñeiro Monteagudo, Leonardo Taccari, Aurel Pjetri, Francesco Sambo, Samuele Salti Retrieval Augmented Recipe Generation
Guoshan Liu, Hailong Yin, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, Yu-Gang Jiang Revisiting Disparity from Dual-Pixel Images: Physics-Informed Lightweight Depth Estimation
Teppei Kurita, Yuhi Kondo, Legong Sun, Takayuki Sasaki, Sho Nitta, Yasuhiro Hashimoto, Yoshinori Muramatsu, Yusuke Moriuchi RGB-D Video Mirror Detection
Mingchen Xu, Peter Herbert, Yu-Kun Lai, Ze Ji, Jing Wu RiemStega: Covariance-Based Loss for Print-Proof Transmission of Data in Images
Aniana Cruz, Guilherme Schardong, Luiz Schirmer, João Marcos, Farhad Shadmand, Nuno Gonçalves Robot Instance Segmentation with Few Annotations for Grasping
Moshe Kimhi, David Vainshtein, Chaim Baskin, Dotan Di Castro Robust Novelty Detection Through Style-Conscious Feature Ranking
Stefan Smeu, Elena Burceanu, Emanuela Haller, Andrei Liviu Nicolicioiu SALVE: A 3D Reconstruction Benchmark of Wounds from Consumer-Grade Videos
Remi Chierchia, Leo Lebrat, David Ahmedt-Aristizabal, Olivier Salvado, Clinton Fookes, Rodrigo Santa Cruz SAM-DA: Decoder Adapter for Efficient Medical Domain Adaptation
Javier Gamazo Tejero, Moritz J Schmid, Pablo Márquez Neila, Martin Zinkernagel, Sebastian Wolf, Raphael Sznitman SAND: Enhancing Open-Set Neuron Descriptions Through Spatial Awareness
Anvita Agarwal Srinivas, Tuomas Oikarinen, Divyansh Srivastava, Wei-Hung Weng, Tsui-Wei Weng SANPO: A Scene Understanding Accessibility and Human Navigation Dataset
Sagar M. Waghmare, Kimberly Wilber, Dave Hawkey, Xuan Yang, Matthew Wilson, Stephanie Debats, Cattalyya Nuengsigkapian, Astuti Sharma, Lars Pandikow, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko SCOT: Self-Supervised Contrastive Pretraining for Zero-Shot Compositional Retrieval
Bhavin Jawade, João V. B. Soares, Kapil Thadani, Deen Dayal Mohan, Amir Erfan Eshratifar, Benjamin Culpepper, Paloma de Juan, Srirangaraj Setlur, Venu Govindaraju SEED4D: A Synthetic Ego-Exo Dynamic 4D Data Generator Driving Dataset and Benchmark
Marius Kästingschäfer, Théo Gieruc, Sebastian Bernhard, Dylan Campbell, Eldar Insafutdinov, Eyvaz Najafli, Thomas Brox SegBuilder: A Semi-Automatic Annotation Tool for Segmentation
Md Alimoor Reza, Eric Manley, Sean Chen, Sameer Chaudhary, Jacob Elafros Segment Anything Meets Point Tracking
Frano Rajič, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu Self-Aligning Depth-Regularized Radiance Fields for Asynchronous RGB-D Sequences
Yuxin Huang, Andong Yang, Yuantao Chen, Runyi Yang, Zhenxin Zhu, Chao Hou, Hao Zhao, Guyue Zhou Self-Supervised Anomaly Segmentation via Diffusion Models with Dynamic Transformer UNet
Komal Kumar, Snehashis Chakraborty, Dwarikanath Mahapatra, Behzad Bozorgtabar, Sudipta Roy Self-Supervised Incremental Learning of Object Representations from Arbitrary Image Sets
George Leotescu, Alin-Ionut Popa, Diana-Nicoleta N Grigore, Daniel Voinea, Pietro Perona Semantically Conditioned Prompts for Visual Recognition Under Missing Modality Scenarios
Vittorio Pipoli, Federico Bolelli, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Costantino Grana, Rita Cucchiara, Elisa Ficarra Semiotic-Based Construction of a Large Emotional Image Dataset with Neutral Samples
Marco Blanchini, Giovanna Dimitri, Lydia Abady, Benedetta Tondi, Tarcisio Lancioni, Mauro Barni SensorFlow: Sensor and Image Fused Video Stabilization
Jiyang Yu, Tianhao Zhang, Fuhao Shi, Lei He, Chia-Kai Liang SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior
Zhongrui Yu, Haoran Wang, Jinze Yang, Hanzhang Wang, Jiale Cao, Zhong Ji, Mingming Sun Shadow Removal Refinement via Material-Consistent Shadow Edges
Shilin Hu, Hieu Le, ShahRukh Athar, Sagnik Das, Dimitris Samaras Shapley Consensus Deep Learning for Ensemble Pruning
Youcef Djenouri, Ahmed Nabil Belbachir, Asma Belhadi, Nassim Belmecheri, Tomasz Michalak Shift Equivariant Pose Network
Pengxiao Wang, Tzu-Heng Lin, Chunyu Wang, Yizhou Wang Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation
Zifu Wan, Pingping Zhang, Yuhao Wang, Silong Yong, Simon Stepputtis, Katia Sycara, Yaqi Xie Sign Language Recognition: A Large-Scale Multi-View Dataset and Comprehensive Evaluation
Nguyen Son Dinh, Tuan Dung Nguyen, Duc Tri Tran, Nguyen Dang Huy Pham, Thuan Hieu Tran, Ngoc Anh Tong, Quang Huy Hoang, Phi Le Nguyen SIGNN - Star Identification Using Graph Neural Networks
Floyd Hepburn-Dickins, Mark W. Jones, Mike Edwards, Jay Paul Morgan, Steve Bell Skyeyes: Ground Roaming Using Aerial View Images
Zhiyuan Gao, Wenbin Teng, Gonglin Chen, Jinsen Wu, Ningli Xu, Rongjun Qin, Andrew Feng, Yajie Zhao SmartKC++: Improving Performance of Smartphone-Based Corneal Topographers
Vaibhav Ganatra, Siddhartha Gairola, Pallavi Joshi, Anand Balasubramaniam, Kaushik Murali, Arivunithi Varadharajan, Bellamkonda Mallikarjuna, Nipun Kwatra, Mohit Jain Social EgoMesh Estimation
Luca Scofano, Alessio Sampieri, Edoardo De Matteis, Indro Spinelli, Fabio Galasso SODA: Spectral Orthogonal Decomposition Adaptation for Diffusion Models
Xinxi Zhang, Song Wen, Ligong Han, Felix Juefei-Xu, Akash Srivastava, Junzhou Huang, Vladimir Pavlovic, Hao Wang, Molei Tao, Dimitris Metaxas Stable Autofocus with Focal Consistency Loss
Sangwon Lee, Myungsub Choi, Nagyeong Lee, Hyong-Euk Lee Street TryOn: Learning In-the-Wild Virtual Try-on from Unpaired Person Images
Aiyu Cui, Jay Mahajan, Viraj Shah, Preeti Gomathinayagam, Chang Liu, Svetlana Lazebnik STRIDE: Single-Video Based Temporally Continuous Occlusion-Robust 3D Pose Estimation
Rohit Lal, Saketh Bachu, Yash Garg, Arindam Dutta, Calvin-Khang Ta, Hannah Dela Cruz, Dripta S. Raychaudhuri, M. Salman Asif, Amit Roy-Chowdhury SUM: Saliency Unification Through Mamba for Visual Attention Modeling
Alireza Hosseini, Amirhossein Kazerouni, Saeed Akhavan, Michael Brudno, Babak Taati SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering
Hiroki Nishizawa, Keitaro Tanaka, Asuka Hirata, Shugo Yamaguchi, Qi Feng, Masatoshi Hamanaka, Shigeo Morishima SynDRA: Synthetic Dataset for Railway Applications
Gianluca D'Amico, Federico Nesti, Giulio Rossolini, Mauro Marinoni, Salvatore Sabina, Giorgio Buttazzo TaxaBind: A Unified Embedding Space for Ecological Applications
Srikumar Sastry, Subash Khanal, Aayush Dhakal, Adeel Ahmad, Nathan Jacobs Temporally Grounding Instructional Diagrams in Unconstrained Videos
Jiahao Zhang, Frederic Z. Zhang, Cristian Rodriguez, Yizhak Ben-Shabat, Anoop Cherian, Stephen Gould Temporally Streaming Audio-Visual Synchronization for Real-World Videos
Jordan G Voas, Wei-Cheng Tseng, Layne Berry, Xixi Hu, Puyuan Peng, James Stuedemann, David Harwath Test-Time Adaptation in Point Clouds: Leveraging Sampling Variation with Weight Averaging
Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Sahar Dastani Oghani, Milad Cheraghalikhani, David Osowiechi, Farzad Beizaee, Gustavo A. Vargas Hakim, Ismail Ben Ayed, Christian Desrosiers Test-Time Adaptation of 3D Point Clouds via Denoising Diffusion Models
Hamidreza Dastmalchi, Aijun An, Ali Cheraghian, Shafin Rahman, Sameera Ramasinghe Text Change Detection in Multilingual Documents Using Image Comparison
Doyoung Park, Naresh Reddy Yarram, Sunjin Kim, MinKyu Kim, Seongho Joe, Taehee Lee TFM^2: Training-Free Mask Matching for Open-Vocabulary Semantic Segmentation
Yaoxin Zhuo, Zachary Bessinger, Lichen Wang, Naji Khosravan, Baoxin Li, Sing Bing Kang Through the Curved Cover: Synthesizing Cover Aberrated Scenes with Refractive Field
Liuyue Xie, Jiancong Guo, László A. Jeni, Zhiheng Jia, Mingyang Li, Yunwen Zhou, Chao Guo Token Turing Machines Are Efficient Vision Models
Purvish Jajal, Nick Eliopoulous, Benjamin Shiue-Hal Chou, George K. Thiravathukal, James C. Davis, Yung-Hsiang Lu TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm
Bingqing Zhang, Zhuo Cao, Heming Du, Xin Yu, Xue Li, Jiajun Liu, Sen Wang TORE: Token Recycling in Vision Transformers for Efficient Active Visual Exploration
Jan Olszewski, Dawid Damian Rymarczyk, Piotr Wojcik, Mateusz Pach, Bartosz Zielinski Towards a Training Free Approach for 3D Scene Editing
Vivek Madhavaram, Shivangana Rawat, Chaitanya Devaguptapu, Charu Sharma, Manohar Kaul Towards Accurate Unified Anomaly Segmentation
Wenxin Ma, Qingsong Yao, Xiang Zhang, Zhelong Huang, Zihang Jiang, S.Kevin Zhou Towards Real-Time Open-Vocabulary Video Instance Segmentation
Bin Yan, Martin Sundermeyer, David Joseph Tan, Huchuan Lu, Federico Tombari Towards Robust Training via Gradient-Diversified Backpropagation
Xilin He, Cheng Luo, Qinliang Lin, Weicheng Xie, Muhammad Haris Khan, Siyang Song, Linlin Shen Towards Unbiased Continual Learning: Avoiding Forgetting in the Presence of Spurious Correlations
Giacomo Capitani, Lorenzo Bonicelli, Angelo Porrello, Federico Bolelli, Simone Calderara, Elisa Ficarra TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes
Alessandro D'Amelio, Giuseppe Cartella, Vittorio Cuculo, Manuele Lucchi, Marcella Cornia, Rita Cucchiara, Giuseppe Boccignone TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models
Pengxiang Li, Kai Chen, Zhili Liu, Ruiyuan Gao, Lanqing Hong, Dit-Yan Yeung, Huchuan Lu, Xu Jia Transferring Foundation Models for Generalizable Robotic Manipulation
Jiange Yang, Wenhui Tan, Chuhao Jin, Keling Yao, Bei Liu, Jianlong Fu, Ruihua Song, Gangshan Wu, Limin Wang TRNeRF: Restoring Blurry Rolling Shutter and Noisy Thermal Images with Neural Radiance Fields
Spencer Carmichael, Manohar Bhat, Mani Ramanagopal, Austin Buchan, Ram Vasudevan, Katherine A. Skinner Tumor Synthesis Conditioned on Radiomics
Jonghun Kim, Inye Na, Eun Sook Ko, Hyunjin Park Tuned Contrastive Learning
Chaitanya Animesh, Manmohan Chandraker UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
Haoyu Jiang, Zhi-Qi Cheng, Gabriel Moreira, Jiawen Zhu, Jingdong Sun, Bukun Ren, Jun-Yan He, Qi Dai, Xian-Sheng Hua uLayout: Unified Room Layout Estimation for Perspective and Panoramic Images
Jonathan Lee, Bolivar E Solarte, Chin-Hsuan Wu, Jin-Cheng Jhang, Fu-En Wang, Yi-Hsuan Tsai, Min Sun Uncertainty Awareness Enables Efficient Labeling for Cancer Subtyping in Digital Pathology
Nirhoshan Sivaroopan, Chamuditha Jayanga Galappaththige, Chalani Ekanayake, Hasindri Watawana, Ranga Rodrigo, Chamira U.S. Edussooriya, Dushan N. Wadduwage Uncertainty-Guided Cross Attention Ensemble Mean Teacher for Semi-Supervised Medical Image Segmentation
Meghana Karri, Amit Soni Arya, Koushik Biswas, Nicolo Gennaro, Vedat Cicek, Gorkem Durak, Yury S. Velichko, Ulas Bagci UnDIVE: Generalized Underwater Video Enhancement Using Generative Priors
Suhas Srinath, Aditya Chandrasekar, Hemang Jamadagni, Rajiv Soundararajan, A P Prathosh Unified Framework for Open-World Compositional Zero-Shot Learning
Hirunima Jayasekara, Khoi Pham, Nirat Saini, Abhinav Shrivastava Unleashing Potentials of Vision-Language Models for Zero-Shot HOI Detection
Moyuru Yamada, Nimish Dharamshi, Ayushi Kohli, Prasad Kasu, Ainulla Khan, Manu Ghulyani Unsupervised Single-Image Intrinsic Image Decomposition with LiDAR Intensity Enhanced Training
Shogo Sato, Takuhiro Kaneko, Kazuhiko Murasaki, Taiga Yoshida, Ryuichi Tanida, Akisato Kimura User-in-the-Loop Evaluation of Multimodal LLMs for Activity Assistance
Mrinal Verghese, Brian Chen, Hamid Eghbalzadeh, Tushar Nagarajan, Ruta P Desai VaLID: Variable-Length Input Diffusion for Novel View Synthesis
Shijie Li, Farhad G. Zanjani, Haitam Ben Yahia, Yuki Asano, Juergen Gall, Amirhossein Habibian VerA: Versatile Anonymization Applicable to Clinical Facial Photographs
Majed El Helou, Doruk Cetin, Petar Stamenkovic, Niko Benjamin Huber, Fabio Zünd VILLS : Video-Image Learning to Learn Semantics for Person Re-Identification
Siyuan Huang, Ram Prabhakar Kathirvel, Yuxiang Guo, Rama Chellappa, Cheng Peng Vision-Based Landing Guidance Through Tracking and Orientation Estimation
João P. K. Ferreira, João P. Pinto, Júlia Moura, Yi Li, Cristiano L. Castro, Plamen Angelov Visual Robustness Benchmark for Visual Question Answering (VQA)
Farhan Ishmam, Ishmam Tashdeed, Talukder Asir Saadat, Hamjajul Ashmafee, Abu Raihan Mostofa Kamal, Azam Hossain VLTP: Vision-Language Guided Token Pruning for Task-Oriented Segmentation
Hanning Chen, Yang Ni, Wenjun Huang, Yezi Liu, SungHeon Jeong, Fei Wen, Nathaniel Bastian, Hugo Latapie, Mohsen Imani WAFFLE: Multimodal Floorplan Understanding in the Wild
Keren Ganon, Morris Alper, Rachel Mikulinsky, Hadar Averbuch-Elor Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers
Diana-Nicoleta Grigore, Mariana-Iuliana Georgescu, Jon Alvarez Justo, Tor Johansen, Andreea Iuliana Ionescu, Radu Tudor Ionescu When Visual State Space Model Meets Backdoor Attacks
Sankalp Nagaonkar, Achyut Mani Tripathi, Ashish Mishra WiGNet: Windowed Vision Graph Neural Network
Gabriele Spadaro, Marco Grangetto, Attilio Fiandrotti, Enzo Tartaglione, Jhony H. Giraldo XPose: Towards Extreme Low Light Hand Pose Estimation
Green Rosh, Meghana Shankar, Prateek Kukreja, Anmol Namdev, B H Pawan Prasad XR-MBT: Multi-Modal Full Body Tracking for XR Through Self-Supervision with Learned Depth Point Cloud Registration
Denys Rozumnyi, Nadine Bertsch, Othman Sbai, Filippo Arcadu, Yuhua Chen, Artsiom Sanakoyeu, Manoj Kumar, Catherine Herold, Robin Kips Zero-Shot Detection of Out-of-Context Objects Using Foundation Models
Anirban Roy, Adam Cobb, Ramneet Kaur, Sumit Jha, Nathaniel Bastian, Alexander Berenbeim, Robert Thomson, Iain Cruickshank, Alvaro Velasquez, Susmit Jha ZeroComp: Zero-Shot Object Compositing from Image Intrinsics via Diffusion
Zitian Zhang, Frédéric Fortier-Chouinard, Mathieu Garon, Anand Bhattad, Jean-François Lalonde