CVPR 2025

2871 papers

2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification Jingwei Zhang, Anh Tien Nguyen, Xi Han, Vincent Quoc-Huy Trinh, Hong Qin, Dimitris Samaras, Mahdi S. Hosseini
PDF
3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes Jan Held, Renaud Vandeghen, Abdullah Hamdi, Adrien Deliege, Anthony Cioppa, Silvio Giancola, Andrea Vedaldi, Bernard Ghanem, Marc Van Droogenbroeck
PDF
3D Dental Model Segmentation with Geometrical Boundary Preserving Shufan Xi, Zexian Liu, Junlin Chang, Hongyu Wu, Xiaogang Wang, Aimin Hao
PDF
3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations Yating Wang, Xuan Wang, Ran Yi, Yanbo Fan, Jichen Hu, Jingcheng Zhu, Lizhuang Ma
PDF
3D Gaussian Inpainting with Depth-Guided Cross-View Consistency Sheng-Yu Huang, Zi-Ting Chou, Yu-Chiang Frank Wang
PDF
3D Occupancy Prediction with Low-Resolution Queries via Prototype-Aware View Transformation Gyeongrok Oh, Sungjune Kim, Heeju Ko, Hyung-gun Chi, Jinkyu Kim, Dongwook Lee, Daehyun Ji, Sungjoon Choi, Sujin Jang, Sangpil Kim
PDF
3D Prior Is All You Need: Cross-Task Few-Shot 2D Gaze Estimation Yihua Cheng, Hengfei Wang, Zhongqun Zhang, Yang Yue, Boeun Kim, Feng Lu, Hyung Jin Chang
PDF
3D Student Splatting and Scooping Jialin Zhu, Jiangbei Yue, Feixiang He, He Wang
PDF
3D-AVS: LiDAR-Based 3D Auto-Vocabulary Segmentation Weijie Wei, Osman Ülger, Fatemeh Karimi Nejadasl, Theo Gevers, Martin R. Oswald
PDF
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination Jianing Yang, Xuweiyi Chen, Nikhil Madaan, Madhavan Iyengar, Shengyi Qian, David F. Fouhey, Joyce Chai
PDF
3D-GSW: 3D Gaussian Splatting for Robust Watermarking Youngdong Jang, Hyunje Park, Feng Yang, Heeju Ko, Euijin Choo, Sangpil Kim
PDF
3D-HGS: 3D Half-Gaussian Splatting Haolin Li, Jinyang Liu, Mario Sznaier, Octavia Camps
PDF
3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer Jiajun Deng, Tianyu He, Li Jiang, Tianyu Wang, Feras Dayoub, Ian Reid
PDF
3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning Yuncong Yang, Han Yang, Jiachen Zhou, Peihao Chen, Hongxin Zhang, Yilun Du, Chuang Gan
PDF
3D-MVP: 3D Multiview Pretraining for Manipulation Shengyi Qian, Kaichun Mo, Valts Blukis, David F. Fouhey, Dieter Fox, Ankit Goyal
PDF
3D-SLNR: A Super Lightweight Neural Representation for Large-Scale 3D Mapping Chenhui Shi, Fulin Tang, Ning An, Yihong Wu
PDF
3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement Yihang Luo, Shangchen Zhou, Yushi Lan, Xingang Pan, Chen Change Loy
PDF
3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting Qi Wu, Janick Martinez Esturo, Ashkan Mirzaei, Nicolas Moënne-Loccoz, Zan Gojcic
PDF
3DTopia-XL: Scaling High-Quality 3D Asset Generation via Primitive Diffusion Zhaoxi Chen, Jiaxiang Tang, Yuhao Dong, Ziang Cao, Fangzhou Hong, Yushi Lan, Tengfei Wang, Haozhe Xie, Tong Wu, Shunsuke Saito, Liang Pan, Dahua Lin, Ziwei Liu
PDF
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models Wanhua Li, Renping Zhou, Jiawei Zhou, Yingwei Song, Johannes Herter, Minghan Qin, Gao Huang, Hanspeter Pfister
PDF
4D-Fly: Fast 4D Reconstruction from a Single Monocular Video Diankun Wu, Fangfu Liu, Yi-Hsin Hung, Yue Qian, Xiaohang Zhan, Yueqi Duan
PDF
4Deform: Neural Surface Deformation for Robust Shape Interpolation Lu Sang, Zehranaz Canfes, Dongliang Cao, Riccardo Marin, Florian Bernard, Daniel Cremers
PDF
4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video Qiang Hu, Zihan Zheng, Houqiang Zhong, Sihua Fu, Li Song, Xiaoyun Zhang, Guangtao Zhai, Yanfeng Wang
PDF
4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians Hidenobu Matsuki, Gwangbin Bae, Andrew J. Davison
PDF
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo, Willi Menapace, Aliaksandr Siarohin, Michael Vasilkovsky, Ivan Skorokhodov, Sergey Tulyakov, Peter Wonka, Hsin-Ying Lee
PDF
5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks Dongshuo Yin, Leiyi Hu, Bin Li, Youqun Zhang, Xue Yang
PDF
A Bias-Free Training Paradigm for More General AI-Generated Image Detection Fabrizio Guillaro, Giada Zingarini, Ben Usman, Avneesh Sud, Davide Cozzolino, Luisa Verdoliva
PDF
A Closer Look at Time Steps Is Worthy of Triple Speed-up for Diffusion Model Training Kai Wang, Mingjia Shi, Yukun Zhou, Zekai Li, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Hanwang Zhang, Yang You
PDF
A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation Andrew Z. Wang, Songwei Ge, Tero Karras, Ming-Yu Liu, Yogesh Balaji
PDF
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning Xin Wen, Bingchen Zhao, Yilun Chen, Jiangmiao Pang, Xiaojuan Qi
PDF
A Dataset for Semantic Segmentation in the Presence of Unknowns Zakaria Laskar, Tomas Vojir, Matej Grcic, Iaroslav Melekhov, Shankar Gangisetty, Juho Kannala, Jiri Matas, Giorgos Tolias, C.V. Jawahar
PDF
A Distractor-Aware Memory for Visual Object Tracking with SAM2 Jovana Videnovic, Alan Lukezic, Matej Kristan
PDF
A Flag Decomposition for Hierarchical Datasets Nathan Mankovich, Ignacio Santamaria, Gustau Camps-Valls, Tolga Birdal
PDF
A Focused Human Body Model for Accurate Anthropometric Measurements Extraction Shuhang Chen, Xianliang Huang, Zhizhou Zhong, Juhong Guan, Shuigeng Zhou
PDF
A General Adaptive Dual-Level Weighting Mechanism for Remote Sensing Pansharpening Jie Huang, Haorui Chen, Jiaxuan Ren, Siran Peng, Liangjian Deng
PDF
A Hubness Perspective on Representation Learning for Graph-Based Multi-View Clustering Zheming Xu, He Liu, Congyan Lang, Tao Wang, Yidong Li, Michael C. Kampffmeyer
PDF
A Lightweight UDF Learning Framework for 3D Reconstruction Based on Local Shape Functions Jiangbei Hu, Yanggeng Li, Fei Hou, Junhui Hou, Zhebin Zhang, Shengfa Wang, Na Lei, Ying He
PDF
A New Statistical Model of Star Speckles for Learning to Detect and Characterize Exoplanets in Direct Imaging Observations Théo Bodrito, Olivier Flasseur, Julien Mairal, Jean Ponce, Maud Langlois, Anne-Marie Lagrange
PDF
A Physics-Informed Blur Learning Framework for Imaging Systems Liqun Chen, Yuxuan Li, Jun Dai, Jinwei Gu, Tianfan Xue
PDF
A Polarization-Aided Transformer for Image Deblurring via Motion Vector Decomposition Duosheng Chen, Shihao Zhou, Jinshan Pan, Jinglei Shi, Lishen Qu, Jufeng Yang
PDF
A Regularization-Guided Equivariant Approach for Image Restoration Yulu Bai, Jiahong Fu, Qi Xie, Deyu Meng
PDF
A Selective Re-Learning Mechanism for Hyperspectral Fusion Imaging Yuanye Liu, Jinyang Liu, Renwei Dian, Shutao Li
PDF
A Semantic Knowledge Complementarity Based Decoupling Framework for Semi-Supervised Class-Imbalanced Medical Image Segmentation Zheng Zhang, Guanchun Yin, Bo Zhang, Wu Liu, Xiuzhuang Zhou, Wendong Wang
PDF
A Simple Data Augmentation for Feature Distribution Skewed Federated Learning Yunlu Yan, Huazhu Fu, Yuexiang Li, Jinheng Xie, Jun Ma, Guang Yang, Lei Zhu
PDF
A Simple yet Effective Layout Token in Large Language Models for Document Understanding Zhaoqing Zhu, Chuwei Luo, Zirui Shao, Feiyu Gao, Hangdi Xing, Qi Zheng, Ji Zhang
PDF
A Stitch in Time Saves Nine: Small VLM Is a Precise Guidance for Accelerating Large VLMs Wangbo Zhao, Yizeng Han, Jiasheng Tang, Zhikai Li, Yibing Song, Kai Wang, Zhangyang Wang, Yang You
PDF
A Tale of Two Classes: Adapting Supervised Contrastive Learning to Binary Imbalanced Datasets David Mildenberger, Paul Hager, Daniel Rueckert, Martin J. Menten
PDF
A Theory of Learning Unified Model via Knowledge Integration from Label Space Varying Domains Dexuan Zhang, Thomas Westfechtel, Tatsuya Harada
PDF
A Unified Approach to Interpreting Self-Supervised Pre-Training Methods for 3D Point Clouds via Interactions Qiang Li, Jian Ruan, Fanghao Wu, Yuchi Chen, Zhihua Wei, Wen Shen
PDF
A Unified Framework for Heterogeneous Semi-Supervised Learning Marzi Heidari, Abdullah Alchihabi, Hao Yan, Yuhong Guo
PDF
A Unified Image-Dense Annotation Generation Model for Underwater Scenes Hongkai Lin, Dingkang Liang, Zhenghao Qi, Xiang Bai
PDF
A Unified Latent Schrodinger Bridge Diffusion Model for Unsupervised Anomaly Detection and Localization Shilhora Akshay, Niveditha Lakshmi Narasimhan, Jacob George, Vineeth N Balasubramanian
PDF
A Unified Model for Compressed Sensing MRI Across Undersampling Patterns Armeet Singh Jatyani, Jiayun Wang, Aditi Chandrashekar, Zihui Wu, Miguel Liu-Schiaffini, Bahareh Tolooshams, Anima Anandkumar
PDF
A Unified, Resilient, and Explainable Adversarial Patch Detector Vishesh Kumar, Akshay Agarwal
PDF
A Universal Scale-Adaptive Deformable Transformer for Image Restoration Across Diverse Artifacts Xuyi He, Yuhui Quan, Ruotao Xu, Hui Ji
PDF
A3: Few-Shot Prompt Learning of Unlearnable Examples with Cross-Modal Adversarial Feature Alignment Xuan Wang, Xitong Gao, Dongping Liao, Tianrui Qin, Yu-liang Lu, Cheng-zhong Xu
PDF
A4A: Adapter for Adapter Transfer via All-for-All Mapping for Cross-Architecture Models Keyu Tu, Mengqi Huang, Zhuowei Chen, Zhendong Mao
PDF
AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP Wenxin Ma, Xu Zhang, Qingsong Yao, Fenghe Tang, Chenxu Wu, Yingtai Li, Rui Yan, Zihang Jiang, S.Kevin Zhou
PDF
ABBSPO: Adaptive Bounding Box Scaling and Symmetric Prior Based Orientation Prediction for Detecting Aerial Image Objects Woojin Lee, Hyugjae Chang, Jaeho Moon, Jaehyup Lee, Munchurl Kim
PDF
ABC-Former: Auxiliary Bimodal Cross-Domain Transformer with Interactive Channel Attention for White Balance Yu-Cheng Chiu, Guan-Rong Chen, Zihao Chen, Yan-Tsung Peng
PDF
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers Sherwin Bahmani, Ivan Skorokhodov, Guocheng Qian, Aliaksandr Siarohin, Willi Menapace, Andrea Tagliasacchi, David B. Lindell, Sergey Tulyakov
PDF
ACAttack: Adaptive Cross Attacking RGB-T Tracker via Multi-Modal Response Decoupling Xinyu Xiang, Qinglong Yan, Hao Zhang, Jiayi Ma
PDF
Acc3D: Accelerating Single Image to 3D Diffusion Models via Edge Consistency Guided Score Distillation Kendong Liu, Zhiyu Zhu, Hui Liu, Junhui Hou
PDF
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition Zhiyuan Chen, Keyi Li, Yifan Jia, Le Ye, Yufei Ma
PDF
Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction Shiyu Zhao, Zhenting Wang, Felix Juefei-Xu, Xide Xia, Miao Liu, Xiaofang Wang, Mingfu Liang, Ning Zhang, Dimitris N. Metaxas, Licheng Yu
PDF
Accurate Differential Operators for Hybrid Neural Fields Aditya Chetan, Guandao Yang, Zichen Wang, Steve Marschner, Bharath Hariharan
PDF
Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation Andrea Maracani, Savas Ozkan, Sijun Cho, Hyowon Kim, Eunchung Noh, Jeongwon Min, Cho Jung Min, Dookun Park, Mete Ozay
PDF
ACE: Anti-Editing Concept Erasure in Text-to-Image Models Zihao Wang, Yuxiang Wei, Fan Li, Renjing Pei, Hang Xu, Wangmeng Zuo
PDF
ACL: Activating Capability of Linear Attention for Image Restoration Yubin Gu, Yuan Meng, Jiayi Ji, Xiaoshuai Sun
PDF
Acquire and Then Adapt: Squeezing Out Text-to-Image Model for Image Restoration Junyuan Deng, Xinyi Wu, Yongxing Yang, Congchao Zhu, Song Wang, Zhenyao Wu
PDF
Action Detail Matters: Refining Video Recognition with Local Action Queries Mengmeng Wang, Zeyi Huang, Xiangjie Kong, Guojiang Shen, Guang Dai, Jingdong Wang, Yong Liu
PDF
Activating Sparse Part Concepts for 3D Class Incremental Learning Zhenya Tian, Jun Xiao, Lupeng Liu, Haiyong Jiang
PDF
Active Data Curation Effectively Distills Large-Scale Multimodal Models Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem, Talfan Evans, Samuel Albanie, Federico Tombari, Yongqin Xian, Alessio Tonioni, Olivier J. Henaff
PDF
Active Event-Based Stereo Vision Jianing Li, Yunjian Zhang, Haiqian Han, Xiangyang Ji
PDF
Active Hyperspectral Imaging Using an Event Camera Bohan Yu, Jinxiu Liang, Zhuofeng Wang, Bin Fan, Art Subpa-asa, Boxin Shi, Imari Sato
PDF
ActiveGAMER: Active GAussian Mapping Through Efficient Rendering Liyan Chen, Huangying Zhan, Kevin Chen, Xiangyu Xu, Qingan Yan, Changjiang Cai, Yi Xu
PDF
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction Yuanbin Man, Ying Huang, Chengming Zhang, Bingzhe Li, Wei Niu, Miao Yin
PDF
AdaDARE-Gamma: Balancing Stability and Plasticity in Multi-Modal LLMs Through Efficient Adaptation Jingyi Xie, Jintao Yang, Zhunchen Luo, Yunbo Cao, Qiang Gao, Mengyuan Zhang, Wenpeng Hu
PDF
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization Yiyang Du, Xiaochen Wang, Chi Chen, Jiabo Ye, Yiru Wang, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Zhifang Sui, Maosong Sun, Yang Liu
PDF
AdaptCMVC: Robust Adaption to Incremental Views in Continual Multi-View Clustering Jing Wang, Songhe Feng, Kristoffer Knutsen Wickstrøm, Michael C. Kampffmeyer
PDF
Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto
PDF
Adapting Dense Matching for Homography Estimation with Grid-Based Acceleration Kaining Zhang, Yuxin Deng, Jiayi Ma, Paolo Favaro
PDF
Adapting Pre-Trained 3D Models for Point Cloud Video Understanding via Cross-Frame Spatio-Temporal Perception Baixuan Lv, Yaohua Zha, Tao Dai, Xue Yuerong, Ke Chen, Shu-Tao Xia
PDF
Adapting Text-to-Image Generation with Feature Difference Instruction for Generic Image Restoration Chao Wang, Hehe Fan, Huichen Yang, Sarvnaz Karimi, Lina Yao, Yi Yang
PDF
Adapting to Observation Length of Trajectory Prediction via Contrastive Learning Ruiqi Qiu, Jun Gong, Xinyu Zhang, Siqi Luo, Bowen Zhang, Yi Cen
PDF
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds Eitan Shaar, Ariel Shaulov, Gal Chechik, Lior Wolf
PDF
Adaptive Dropout: Unleashing Dropout Across Layers for Generalizable Image Super-Resolution Hang Xu, Jie Huang, Wei Yu, Jiangtong Tan, Zhen Zou, Feng Zhao
PDF
Adaptive Keyframe Sampling for Long Video Understanding Xi Tang, Jihao Qiu, Lingxi Xie, Yunjie Tian, Jianbin Jiao, Qixiang Ye
PDF
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding Han Xiao, Yina Xie, Guanxin Tan, Yinghao Chen, Rui Hu, Ke Wang, Aojun Zhou, Hao Li, Hao Shao, Xudong Lu, Peng Gao, Yafei Wen, Xiaoxin Chen, Shuai Ren, Hongsheng Li
PDF
Adaptive Non-Uniform Timestep Sampling for Accelerating Diffusion Model Training Myunsoo Kim, Donghyeon Ki, Seong-Woong Shim, Byung-Jun Lee
PDF
Adaptive Parameter Selection for Tuning Vision-Language Models Yi Zhang, Yi-Xuan Deng, Meng-Hao Guo, Shi-Min Hu
PDF
Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement Qiyuan Dai, Hanzhuo Huang, Yu Wu, Sibei Yang
PDF
Adaptive Rectangular Convolution for Remote Sensing Pansharpening Xueyang Wang, Zhixin Zheng, Jiandong Shao, Yule Duan, Liang-Jian Deng
PDF
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition Chengxiang Huang, Yake Wei, Zequn Yang, Di Hu
PDF
ADD: Attribution-Driven Data Augmentation Framework for Boosting Image Super-Resolution Ze-Yu Mi, Yu-Bin Yang
PDF
AdMiT: Adaptive Multi-Source Tuning in Dynamic Environments Xiangyu Chang, Fahim Faisal Niloy, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit Roy-Chowdhury
PDF
ADU: Adaptive Detection of Unknown Categories in Black-Box Domain Adaptation Yushan Lai, Guowen Li, Haoyuan Liang, Juepeng Zheng, Zhiyu Ye
PDF
Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks Junying Wang, Hongyuan Zhang, Yuan Yuan
PDF
Advancing Adversarial Robustness in GNeRFs: The IL2-NeRF Attack Nicole Meng, Caleb Manicke, Ronak Sahu, Caiwen Ding, Yingjie Lao
PDF
Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models Yankai Jiang, Peng Zhang, Donglin Yang, Yuan Tian, Hai Lin, Xiaosong Wang
PDF
Advancing Manga Analysis: Comprehensive Segmentation Annotations for the Manga109 Dataset Minshan Xie, Jian Lin, Hanyuan Liu, Chengze Li, Tien-Tsin Wong
PDF
Advancing Multiple Instance Learning with Continual Learning for Whole Slide Imaging Xianrui Li, Yufei Cui, Jun Li, Antoni B. Chan
PDF
Advancing Myopia to Holism: Fully Contrastive Language-Image Pre-Training Haicheng Wang, Chen Ju, Weixiong Lin, Shuai Xiao, Mengting Chen, Yixuan Huang, Chang Liu, Mingshuai Yao, Jinsong Lan, Ying Chen, Qingwen Liu, Yanfeng Wang
PDF
Advancing Semantic Future Prediction Through Multimodal Visual Sequence Transformers Efstathios Karypidis, Ioannis Kakogeorgiou, Spyros Gidaris, Nikos Komodakis
PDF
Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency Feng Wang, Timing Yang, Yaodong Yu, Sucheng Ren, Guoyizhe Wei, Angtian Wang, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie
PDF
Adversarial Diffusion Compression for Real-World Image Super-Resolution Bin Chen, Gehui Li, Rongyuan Wu, Xindong Zhang, Jie Chen, Jian Zhang, Lei Zhang
PDF
Adversarial Domain Prompt Tuning and Generation for Single Domain Generalization Zhipeng Xu, De Cheng, Xinyang Jiang, Nannan Wang, Dongsheng Li, Xinbo Gao
PDF
AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis Khiem Vuong, Anurag Ghosh, Deva Ramanan, Srinivasa Narasimhan, Shubham Tulsiani
PDF
AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation Datao Tang, Xiangyong Cao, Xuan Wu, Jialin Li, Jing Yao, Xueru Bai, Dongsheng Jiang, Yin Li, Deyu Meng
PDF
AeSPa : Attention-Guided Self-Supervised Parallel Imaging for MRI Reconstruction Jinho Joo, Hyeseong Kim, Hyeyeon Won, Deukhee Lee, Taejoon Eo, Dosik Hwang
PDF
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-Step Preference Optimization Zhanhao Liang, Yuhui Yuan, Shuyang Gu, Bohan Chen, Tiankai Hang, Mingxi Cheng, Ji Li, Liang Zheng
PDF
AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-Modal Large Language Models Sohan Patnaik, Rishabh Jain, Balaji Krishnamurthy, Mausoom Sarkar
PDF
AffordDP: Generalizable Diffusion Policy with Transferable Affordance Shijie Wu, Yihang Zhu, Yunao Huang, Kaizhen Zhu, Jiayuan Gu, Jingyi Yu, Ye Shi, Jingya Wang
PDF
AFL: A Single-Round Analytic Approach for Federated Learning with Pre-Trained Models Run He, Kai Tong, Di Fang, Han Sun, Ziqian Zeng, Haoran Li, Tianyi Chen, Huiping Zhuang
PDF
AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-Based Person Re-Identification Huy Nguyen, Kien Nguyen, Akila Pemasiri, Feng Liu, Sridha Sridharan, Clinton Fookes
PDF
AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark Li Lin, Santosh Santosh, Mingyang Wu, Xin Wang, Shu Hu
PDF
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM Jiarui Wang, Huiyu Duan, Guangtao Zhai, Juntong Wang, Xiongkuo Min
PDF
AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data Zengqun Zhao, Ziquan Liu, Yu Cao, Shaogang Gong, Ioannis Patras
PDF
AIpparel: A Multimodal Foundation Model for Digital Garments Kiyohiro Nakayama, Jan Ackermann, Timur Levent Kesdogan, Yang Zheng, Maria Korosteleva, Olga Sorkine-Hornung, Leonidas J. Guibas, Guandao Yang, Gordon Wetzstein
PDF
AirRoom: Objects Matter in Room Reidentification Runmao Yao, Yi Du, Zhuoqun Chen, Haoze Zheng, Chen Wang
PDF
AKiRa: Augmentation Kit on Rays for Optical Video Generation Xi Wang, Robin Courant, Marc Christie, Vicky Kalogeiton
PDF
Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space Yifan Zhou, Zeqi Xiao, Shuai Yang, Xingang Pan
PDF
ALIEN: Implicit Neural Representations for Human Motion Prediction Under Arbitrary Latency Dong Wei, Xiaoning Sun, Xizhan Gao, Shengxiang Hu, Huaijiang Sun
PDF
Align-a-Video: Deterministic Reward Tuning of Image Diffusion Models for Consistent Video Editing Shengzhi Wang, Yingkang Zhong, Jiangchuan Mu, Kai Wu, Mingliang Xiong, Wen Fang, Mingqing Liu, Hao Deng, Bin He, Gang Li, Qingwen Liu
PDF
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement Qianhan Feng, Wenshuo Li, Tong Lin, Xinghao Chen
PDF
Align3R: Aligned Monocular Depth Estimation for Dynamic Videos Jiahao Lu, Tianyu Huang, Peng Li, Zhiyang Dou, Cheng Lin, Zhiming Cui, Zhen Dong, Sai-Kit Yeung, Wenping Wang, Yuan Liu
PDF
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-Modal Alignment Yan Li, Yifei Xing, Xiangyuan Lan, Xin Li, Haifeng Chen, Dongmei Jiang
PDF
Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering Yuanhao Zou, Zhaozheng Yin
PDF
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana, Noor Ahsan, Nevasini Sasikumar, Omkar Thawakar, Henok Biadglign Ademtew, Yahya Hmaiti, Amandeep Kumar, Kartik Kukreja, Mykola Maslych, Wafa Al Ghallabi, Mihail Minkov Mihaylov, Chao Qin, Abdelrahman M. Shaker, Mike Zhang, Mahardika Krisna Ihsani, Amiel Gian Esplana, Monil Gokani, Shachar Mirkin, Harsh Singh, Ashay Srivastava, Endre Hamerlik, Fathinah Asma Izzati, Fadillah Adamsyah Maani, Sebastian Cavada, Jenny Chim, Rohit Gupta, Sanjay Manjunath, Kamila Zhumakhanova, Feno Heriniaina Rabevohitra, Azril Hafizi Amirudin, Muhammad Ridzuan, Daniya Najiha Abdul Kareem, Ketan Pravin More, Kunyang Li, Pramesh Shakya, Muhammad Saad, Amirpouya Ghasemaghaei, Amirbek Djanibekov, Dilshod Azizov, Branislava Jankovic, Naman Bhatia, Alvaro Cabrera, Johan Obando-Ceron, Olympiah Otieno, Febian Farestam, Muztoba Rabbani, Sanoojan Ballah, Santosh Sanjeev, Abduragim Shtanchaev, Maheen Fatima, Thao Nguyen, Amrin Kareem, Toluwani Aremu, Nathan Augusto Zacarias Xavier, Amit Bhatkal, Hawau Olamide Toyin, Aman Chadha, Hisham Cholakkal, Rao Muhammad Anwer, Michael Felsberg, Jorma Laaksonen, Thamar Solorio, Monojit Choudhury, Ivan Laptev, Mubarak Shah, Salman Khan, Fahad Shahbaz Khan
PDF
All-Day Multi-Camera Multi-Target Tracking Huijie Fan, Yu Qiao, Yihao Zhen, Tinghui Zhao, Baojie Fan, Qiang Wang
PDF
All-Directional Disparity Estimation for Real-World QPD Images Hongtao Yu, Shaohui Song, Lihu Sun, Wenkai Su, Xiaodong Yang, Chengming Liu
PDF
All-Optical Nonlinear Diffractive Deep Network for Ultrafast Image Denoising Xiaoling Zhou, Zhemg Lee, Wei Ye, Rui Xie, Wenbo Zhang, Guanju Peng, Zongze Li, Shikun Zhang
PDF
AlphaPre: Amplitude-Phase Disentanglement Model for Precipitation Nowcasting Kenghong Lin, Baoquan Zhang, Demin Yu, Wenzhi Feng, Shidong Chen, Feifan Gao, Xutao Li, Yunming Ye
PDF
AMO Sampler: Enhancing Text Rendering with Overshooting Xixi Hu, Keyang Xu, Bo Liu, Qiang Liu, Hongliang Fei
PDF
AMR-Transformer: Enabling Efficient Long-Range Interaction for Complex Neural Fluid Simulation Zeyi Xu, Jinfan Liu, Kuangxu Chen, Ye Chen, Zhangli Hu, Bingbing Ni
PDF
An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models Wentao Qu, Jing Wang, YongShun Gong, Xiaoshui Huang, Liang Xiao
PDF
An Image-like Diffusion Method for Human-Object Interaction Detection Xiaofei Hui, Haoxuan Qu, Hossein Rahmani, Jun Liu
PDF
Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation Zhuoran Zhao, Linlin Yang, Pengzhan Sun, Pan Hui, Angela Yao
PDF
Anatomical Consistency and Adaptive Prior-Informed Transformation for Multi-Contrast MR Image Synthesis via Diffusion Model Yejee Shin, Yeeun Lee, Hanbyol Jang, Geonhui Son, Hyeongyu Kim, Dosik Hwang
PDF
Anchor-Aware Similarity Cohesion in Target Frames Enables Predicting Temporal Moment Boundaries in 2D Jiawei Tan, Hongxing Wang, Junwu Weng, Jiaxin Li, Zhilong Ou, Kang Dang
PDF
AniDoc: Animation Creation Made Easier Yihao Meng, Hao Ouyang, Hanlin Wang, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Zhiheng Liu, Yujun Shen, Huamin Qu
PDF
AniGrad: Anisotropic Gradient-Adaptive Sampling for 3D Reconstruction from Monocular Video Noah Stier, Alex Rich, Pradeep Sen, Tobias Höllerer
PDF
AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction Lingteng Qiu, Shenhao Zhu, Qi Zuo, Xiaodong Gu, Yuan Dong, Junfei Zhang, Chao Xu, Zhe Li, Weihao Yuan, Liefeng Bo, Guanying Chen, Zilong Dong
PDF
Animate and Sound an Image Xihua Wang, Ruihua Song, Chongxuan Li, Xin Cheng, Boyuan Li, Yihan Wu, Yuyue Wang, Hongteng Xu, Yunfeng Wang
PDF
AnimateAnything: Consistent and Controllable Animation for Video Generation Guojun Lei, Chi Wang, Rong Zhang, Yikai Wang, Hong Li, Weiwei Xu
PDF
AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer Jin Lyu, Tianyi Zhu, Yi Gu, Li Lin, Pujin Cheng, Yebin Liu, Xiaoying Tang, Liang An
PDF
AniMo: Species-Aware Model for Text-Driven Animal Motion Generation Xuan Wang, Kai Ruan, Xing Zhang, Gaoang Wang
PDF
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction Yuejiao Su, Yi Wang, Qiongyang Hu, Chuang Yang, Lap-Pui Chau
PDF
Annotation Ambiguity Aware Semi-Supervised Medical Image Segmentation Suruchi Kumari, Pravendra Singh
PDF
AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios Ziming Huang, Xurui Li, Haotian Liu, Feng Xue, Yuzhe Wang, Yu Zhou
PDF
Anomize: Better Open Vocabulary Video Anomaly Detection Fei Li, Wenxuan Liu, Jingjing Chen, Ruixu Zhang, Yuran Wang, Xian Zhong, Zheng Wang
PDF
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception Yuanchen Wu, Lu Zhang, Hang Yao, Junlong Du, Ke Yan, Shouhong Ding, Yunsheng Wu, Xiaoqiang Li
PDF
Any-Resolution AI-Generated Image Detection by Spectral Learning Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris, Efstratios Gavves
PDF
Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking Phuc Nguyen, Minh Luu, Anh Tran, Cuong Pham, Khoi Nguyen
PDF
Any6D: Model-Free 6d Pose Estimation of Novel Objects Taeyeop Lee, Bowen Wen, Minjun Kang, Gyuree Kang, In So Kweon, Kuk-Jin Yoon
PDF
Anyattack: Towards Large-Scale Self-Supervised Adversarial Attacks on Vision-Language Models Jiaming Zhang, Junhong Ye, Xingjun Ma, Yige Li, Yunfan Yang, Yunhao Chen, Jitao Sang, Dit-Yan Yeung
PDF
AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos Felix Wimbauer, Weirong Chen, Dominik Muhle, Christian Rupprecht, Daniel Cremers
PDF
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models Xinghui Li, Qichao Sun, Pengze Zhang, Fulong Ye, Zhichao Liao, Wanquan Feng, Songtao Zhao, Qian He
PDF
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea Qifan Yu, Wei Chow, Zhongqi Yue, Kaihang Pan, Yang Wu, Xiaoyang Wan, Juncheng Li, Siliang Tang, Hanwang Zhang, Yueting Zhuang
PDF
AnyMap: Learning a General Camera Model for Structure-from-Motion with Unknown Distortion in Dynamic Scenes Andrea Porfiri Dal Cin, Georgi Dikov, Jihong Ju, Mohsen Ghafoorian
PDF
AnyMoLe: Any Character Motion In-Betweening Leveraging Video Diffusion Models Kwan Yun, Seokhyeon Hong, Chaelin Kim, Junyong Noh
PDF
AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities Guillaume Astruc, Nicolas Gonthier, Clément Mallet, Loic Landrieu
PDF
APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers Zhuguanyu Wu, Jiayi Zhang, Jiaxin Chen, Jinyang Guo, Di Huang, Yunhong Wang
PDF
Apollo: An Exploration of Video Understanding in Large Multimodal Models Orr Zohar, Xiaohan Wang, Yann Dubois, Nikhil Mehta, Tong Xiao, Philippe Hansen-Estruch, Licheng Yu, Xiaofang Wang, Felix Juefei-Xu, Ning Zhang, Serena Yeung-Levy, Xide Xia
PDF
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation Yiming Qin, Zhu Xu, Yang Liu
PDF
APT: Adaptive Personalized Training for Diffusion Models with Limited Data JungWoo Chae, Jiyoon Kim, JaeWoong Choi, Kyungyul Kim, Sangheum Hwang
PDF
AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion Mingzhen Sun, Weining Wang, Gen Li, Jiawei Liu, Jiahui Sun, Wanquan Feng, Shanshan Lao, Siyu Zhou, Qian He, Jing Liu
PDF
Arbitrary-Steps Image Super-Resolution via Diffusion Inversion Zongsheng Yue, Kang Liao, Chen Change Loy
PDF
Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros Potamias, Alexandros Lattas, Stefanos Zafeiriou
PDF
ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points Qirui Huang, Runze Zhang, Kangjun Liu, Minglun Gong, Hao Zhang, Hui Huang
PDF
Are Images Indistinguishable to Humans Also Indistinguishable to Classifiers? Zebin You, Xinyu Zhang, Hanzhong Guo, Jingdong Wang, Chongxuan Li
PDF
Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized? Jianyang Xie, Yitian Zhao, Yanda Meng, He Zhao, Anh Nguyen, Yalin Zheng
PDF
Argus: A Compact and Versatile Foundation Model for Vision Weiming Zhuang, Chen Chen, Zhizhong Li, Sina Sajadmanesh, Jingtao Li, Jiabo Huang, Vikash Sehwag, Vivek Sharma, Hirotaka Shinozaki, Felan Carlo Garcia, Yihao Zhan, Naohiro Adachi, Ryoji Eki, Michael Spranger, Peter Stone, Lingjuan Lyu
PDF
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought Yunze Man, De-An Huang, Guilin Liu, Shiwei Sheng, Shilong Liu, Liang-Yan Gui, Jan Kautz, Yu-Xiong Wang, Zhiding Yu
PDF
ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding Guangda Ji, Silvan Weder, Francis Engelmann, Marc Pollefeys, Hermann Blum
PDF
ARM: Appearance Reconstruction Model for Relightable 3D Generation Xiang Feng, Chang Yu, Zoubin Bi, Yintong Shang, Feng Gao, Hongzhi Wu, Kun Zhou, Chenfanfu Jiang, Yin Yang
PDF
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation Nicolas Dufour, Vicky Kalogeiton, David Picard, Loic Landrieu
PDF
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation Yifan Pu, Yiming Zhao, Zhicong Tang, Ruihong Yin, Haoxing Ye, Yuhui Yuan, Dong Chen, Jianmin Bao, Sirui Zhang, Yanbin Wang, Lin Liang, Lijuan Wang, Ji Li, Xiu Li, Zhouhui Lian, Gao Huang, Baining Guo
PDF
ArtFormer: Controllable Generation of Diverse 3D Articulated Objects Jiayi Su, Youhe Feng, Zheng Li, Jinhua Song, Yangfan He, Botao Ren, Botian Xu
PDF
Articulated Kinematics Distillation from Video Diffusion Models Xuan Li, Qianli Ma, Tsung-Yi Lin, Yongxin Chen, Chenfanfu Jiang, Ming-Yu Liu, Donglai Xiang
PDF
ArticulatedGS: Self-Supervised Digital Twin Modeling of Articulated Objects Using 3D Gaussian Splatting Junfu Guo, Yu Xin, Gaoyi Liu, Kai Xu, Ligang Liu, Ruizhen Hu
PDF
ArtiFade: Learning to Generate High-Quality Subject from Blemished Images Shuya Yang, Shaozhe Hao, Yukang Cao, Kwan-Yee K. Wong
PDF
ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary Zeqi Gu, Yin Cui, Zhaoshuo Li, Fangyin Wei, Yunhao Ge, Jinwei Gu, Ming-Yu Liu, Abe Davis, Yifan Ding
PDF
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding Zhenxing Zhang, Yaxiong Wang, Lechao Cheng, Zhun Zhong, Dan Guo, Meng Wang
PDF
ASHiTA: Automatic Scene-Grounded HIerarchical Task Analysis Yun Chang, Leonor Fermoselle, Duy Ta, Bernadette Bucher, Luca Carlone, Jiuguang Wang
PDF
ASIGN: An Anatomy-Aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics Junchao Zhu, Ruining Deng, Tianyuan Yao, Juming Xiong, Chongyu Qu, Junlin Guo, Siqi Lu, Mengmeng Yin, Yu Wang, Shilin Zhao, Haichun Yang, Yuankai Huo
PDF
Assessing and Learning Alignment of Unimodal Vision and Language Models Le Zhang, Qian Yang, Aishwarya Agrawal
PDF
Associative Transformer Yuwei Sun, Hideya Ochiai, Zhirong Wu, Stephen Lin, Ryota Kanai
PDF
Asynchronous Collaborative Graph Representation for Frames and Events Dianze Li, Jianing Li, Xu Liu, Xiaopeng Fan, Yonghong Tian
PDF
ATA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting Yizhe Tang, Zhimin Sun, Yuzhen Du, Ran Yi, Guangben Lu, Teng Hu, Luying Li, Lizhuang Ma, Fangyuan Zou
PDF
AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward Haonan Han, Xiangzuo Wu, Huan Liao, Zunnan Xu, Zhongyuan Hu, Ronghui Li, Yachao Zhang, Xiu Li
PDF
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models Xubing Ye, Yukang Gan, Yixiao Ge, Xiao-Ping Zhang, Yansong Tang
PDF
ATP: Adaptive Threshold Pruning for Efficient Data Encoding in Quantum Neural Networks Mohamed Afane, Gabrielle Ebbrecht, Ying Wang, Juntao Chen, Junaid Farooq
PDF
Attend to Not Attended: Structure-Then-Detail Token Merging for Post-Training DiT Acceleration Haipeng Fang, Sheng Tang, Juan Cao, Enshuo Zhang, Fan Tang, Tong-Yee Lee
PDF
Attention Distillation: A Unified Approach to Visual Characteristics Transfer Yang Zhou, Xu Gao, Zichong Chen, Hui Huang
PDF
Attention IoU: Examining Biases in CelebA Using Attention Maps Aaron Serianni, Tyler Zhu, Olga Russakovsky, Vikram V. Ramaswamy
PDF
Attraction Diminishing and Distributing for Few-Shot Class-Incremental Learning Li-Jun Zhao, Zhen-Duo Chen, Yongxin Wang, Xin Luo, Xin-Shun Xu
PDF
Attribute-Formed Class-Specific Concept Space: Endowing Language Bottleneck Model with Better Interpretability and Scalability Jianyang Zhang, Qianli Luo, Guowu Yang, Wenjing Yang, Weide Liu, Guosheng Lin, Fengmao Lv
PDF
Attribute-Missing Multi-View Graph Clustering Bowen Zhao, Qianqian Wang, Zhengming Ding, Quanxue Gao
PDF
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers Jiazhi Guan, Kaisiyuan Wang, Zhiliang Xu, Quanwei Yang, Yasheng Sun, Shengyi He, Borong Liang, Yukang Cao, Yingying Li, Haocheng Feng, Errui Ding, Jingdong Wang, Youjian Zhao, Hang Zhou, Ziwei Liu
PDF
Audio-Visual Instance Segmentation Ruohao Guo, Xianghua Ying, Yaru Chen, Dantong Niu, Guangyao Li, Liao Qu, Yanyu Qi, Jinxing Zhou, Bowei Xing, Wenzhen Yue, Ji Shi, Qixun Wang, Peiliang Zhang, Buwen Liang
PDF
Audio-Visual Semantic Graph Network for Audio-Visual Event Localization Liang Liu, Shuaiyong Li, Yongqiang Zhu
PDF
Augmented Deep Contexts for Spatially Embedded Video Coding Yifan Bian, Chuanbo Tang, Li Li, Dong Liu
PDF
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-Based Visual Question Answering Federico Cocchi, Nicholas Moratelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
PDF
Augmenting Perceptual Super-Resolution via Image Quality Predictors Fengjia Zhang, Samrudhdhi B. Rangrej, Tristan Aumentado-Armstrong, Afsaneh Fazly, Alex Levinshtein
PDF
AuraFusion360: Augmented Unseen Region Alignment for Reference-Based 360deg Unbounded Scene Inpainting Chung-Ho Wu, Yang-Jung Chen, Ying-Huan Chen, Jie-Ying Lee, Bo-Hsu Ke, Chun-Wei Tuan Mu, Yi-Chuan Huang, Chin-Yang Lin, Min-Hung Chen, Yen-Yu Lin, Yu-Lun Liu
PDF
Auto Cherry-Picker: Learning from High-Quality Generative Data Driven by Language Yicheng Chen, Xiangtai Li, Yining Li, Yanhong Zeng, Jianzong Wu, Xiangyu Zhao, Kai Chen
PDF
Auto-Encoded Supervision for Perceptual Image Super-Resolution MinKyu Lee, Sangeek Hyun, Woojin Jun, Jae-Pil Heo
PDF
AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning Yuheng Xu, Shijie Yang, Xin Liu, Jie Liu, Jie Tang, Gangshan Wu
PDF
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation Yuhui Zhang, Yuchang Su, Yiming Liu, Xiaohan Wang, James Burgess, Elaine Sui, Chenyu Wang, Josiah Aklilu, Alejandro Lozano, Anjiang Wei, Ludwig Schmidt, Serena Yeung-Levy
PDF
Automated Proof of Polynomial Inequalities via Reinforcement Learning Banglong Liu, Niuniu Qi, Xia Zeng, Lydia Dehbi, Zhengfeng Yang
PDF
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression Xiaoyi Qu, David Aponte, Colby Banbury, Daniel P. Robinson, Tianyu Ding, Kazuhito Koishida, Ilya Zharkov, Tianyi Chen
PDF
Automatic Spectral Calibration of Hyperspectral Images: Method, Dataset and Benchmark Zhuoran Du, Shaodi You, Cheng Cheng, Shikui Wei
PDF
AutoPresent: Designing Structured Visuals from Scratch Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, Trevor Darrell
PDF
Autoregressive Distillation of Diffusion Transformers Yeongmin Kim, Sotiris Anagnostidis, Yuming Du, Edgar Schönfeld, Jonas Kohler, Markos Georgopoulos, Albert Pumarola, Ali Thabet, Artsiom Sanakoyeu
PDF
Autoregressive Sequential Pretraining for Visual Tracking Shiyi Liang, Yifan Bai, Yihong Gong, Xing Wei
PDF
AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing Niu Lian, Jun Li, Jinpeng Wang, Ruisheng Luo, Yaowei Wang, Shu-Tao Xia, Bin Chen
PDF
AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration Jiong Lin, Lechen Zhang, Kwansoo Lee, Jialong Ning, Judah Goldfeder, Hod Lipson
PDF
AvatarArtist: Open-Domain 4D Avatarization Hongyu Liu, Xuan Wang, Ziyu Wan, Yue Ma, Jingye Chen, Yanbo Fan, Yujun Shen, Yibing Song, Qifeng Chen
PDF
AVF-MAE++: Scaling Affective Video Facial Masked Autoencoders via Efficient Audio-Visual Self-Supervised Learning Xuecheng Wu, Heli Sun, Yifan Wang, Jiayu Nie, Jie Zhang, Yabing Wang, Junxiao Xue, Liang He
PDF
AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning Kaixuan Wu, Xinde Li, Xinling Li, Chuanfei Hu, Guoliang Wu
PDF
BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng
PDF
BADGR: Bundle Adjustment Diffusion Conditioned by Gradients for Wide-Baseline Floor Plan Reconstruction Yuguang Li, Ivaylo Boyadzhiev, Zixuan Liu, Linda Shapiro, Alex Colburn
PDF
BadToken: Token-Level Backdoor Attacks to Multi-Modal Large Language Models Zenghui Yuan, Jiawen Shi, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun
PDF
Balanced Direction from Multifarious Choices: Arithmetic Meta-Learning for Domain Generalization Xiran Wang, Jian Zhang, Lei Qi, Yinghuan Shi
PDF
Balanced Rate-Distortion Optimization in Learned Image Compression Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu
PDF
Balancing Two Classifiers via a Simplex ETF Structure for Model Calibration Jiani Ni, He Zhao, Jintong Gao, Dandan Guo, Hongyuan Zha
PDF
BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting Yiren Lu, Yunlai Zhou, Disheng Liu, Tuo Liang, Yu Yin
PDF
BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation Yulu Pan, Ce Zhang, Gedas Bertasius
PDF
Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection Zhen Qu, Xian Tao, Xinyi Gong, ShiChen Qu, Qiyu Chen, Zhengtao Zhang, Xingang Wang, Guiguang Ding
PDF
Bayesian Test-Time Adaptation for Vision-Language Models Lihua Zhou, Mao Ye, Shuaifeng Li, Nianxin Li, Xiatian Zhu, Lei Deng, Hongbin Liu, Zhen Lei
PDF
Be More Specific: Evaluating Object-Centric Realism in Synthetic Images Anqi Liang, Ciprian Corneanu, Qianli Feng, Giorgio Giannone, Aleix Martinez
PDF
Believing Is Seeing: Unobserved Object Detection Using Generative Models Subhransu S. Bhattacharjee, Dylan Campbell, Rahul Shome
PDF
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning Fan Lu, Wei Wu, Kecheng Zheng, Shuailei Ma, Biao Gong, Jiawei Liu, Wei Zhai, Yang Cao, Yujun Shen, Zheng-Jun Zha
PDF
Benchmarking Object Detectors Under Real-World Distribution Shifts in Satellite Imagery Sara A. Al-Emadi, Yin Yang, Ferda Ofli
PDF
Beta-FFT: Nonlinear Interpolation and Differentiated Training Strategies for Semi-Supervised Medical Image Segmentation Ming Hu, Jianfu Yin, Zhuangzhuang Ma, Jianheng Ma, Feiyu Zhu, Bingbing Wu, Ya Wen, Meng Wu, Cong Hu, Bingliang Hu, Quan Wang
PDF
BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance Xin Ye, Burhaneddin Yaman, Sheng Cheng, Feng Tao, Abhirup Mallik, Liu Ren
PDF
Beyond Background Shift: Rethinking Instance Replay in Continual Semantic Segmentation Hongmei Yin, Tingliang Feng, Fan Lyu, Fanhua Shang, Hongying Liu, Wei Feng, Liang Wan
PDF
Beyond Clean Training Data: A Versatile and Model-Agnostic Framework for Out-of-Distribution Detection with Contaminated Training Data Yuchuan Li, Jae-Mo Kang, Il-Min Kim
PDF
Beyond Generation: A Diffusion-Based Low-Level Feature Extractor for Detecting AI-Generated Images Nan Zhong, Haoyu Chen, Yiran Xu, Zhenxing Qian, Xinpeng Zhang
PDF
Beyond Human Perception: Understanding Multi-Object World from Monocular View Keyu Guo, Yongle Huang, Shijie Sun, Xiangyu Song, Mingtao Feng, Zedong Liu, Huansheng Song, Tiantian Wang, Jianxin Li, Naveed Akhtar, Ajmal Saeed Mian
PDF
Beyond Image Classification: A Video Benchmark and Dual-Branch Hybrid Discrimination Framework for Compositional Zero-Shot Learning Dongyao Jiang, Haodong Jing, Yongqiang Ma, Nanning Zheng
PDF
Beyond Local Sharpness: Communication-Efficient Global Sharpness-Aware Minimization for Federated Learning Debora Caldarola, Pietro Cagnasso, Barbara Caputo, Marco Ciccone
PDF
Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge Yaqi Zhao, Yuanyang Yin, Lin Li, Mingan Lin, Victor Shea-Jay Huang, Siwei Chen, Weipeng Chen, Baoqun Yin, Zenan Zhou, Wentao Zhang
PDF
Beyond Single-Modal Boundary: Cross-Modal Anomaly Detection Through Visual Prototype and Harmonization Kai Mao, Ping Wei, Yiyang Lian, Yangyang Wang, Nanning Zheng
PDF
Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning Hairui Ren, Fan Tang, He Zhao, Zixuan Wang, Dandan Guo, Yi Chang
PDF
BF-STVSR: B-Splines and Fourier---Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution Eunjin Kim, Hyeonjin Kim, Kyong Hwan Jin, Jaejun Yoo
PDF
BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis Weiguang Zhao, Rui Zhang, Qiufeng Wang, Guangliang Cheng, Kaizhu Huang
PDF
BG-Triangle: Bezier Gaussian Triangle for 3D Vectorization and Rendering Minye Wu, Haizhao Dai, Kaixin Yao, Tinne Tuytelaars, Jingyi Yu
PDF
BHViT: Binarized Hybrid Vision Transformer Tian Gao, Yu Zhang, Zhiyuan Zhang, Huajun Liu, Kaijie Yin, Chengzhong Xu, Hui Kong
PDF
Bias for Action: Video Implicit Neural Representations with Bias Modulation Alper Kayabasi, Anil Kumar Vadathya, Guha Balakrishnan, Vishwanath Saragadam
PDF
BIGS: Bimanual Category-Agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting Jeongwan On, Kyeonghwan Gwak, Gunyoung Kang, Junuk Cha, Soohyun Hwang, Hyein Hwang, Seungryul Baek
PDF
BiLoRA: Almost-Orthogonal Parameter Spaces for Continual Learning Hao Zhu, Yifei Zhang, Junhao Dong, Piotr Koniusz
PDF
BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-Uniform Motions Wonyong Seo, Jihyong Oh, Munchurl Kim
PDF
BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects Wanyue Zhang, Rishabh Dabral, Vladislav Golyanik, Vasileios Choutas, Eduardo Alvarado, Thabo Beeler, Marc Habermann, Christian Theobalt
PDF
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang, Gedas Bertasius, Lorenzo Torresani
PDF
Binarized Mamba-Transformer for Lightweight Quad Bayer HybridEVS Demosaicing Shiyang Zhou, Haijin Zeng, Yunfan Lu, Tong Shao, Ke Tang, Yongyong Chen, Jie Liu, Jingyong Su
PDF
Binarized Neural Network for Multi-Spectral Image Fusion Junming Hou, Xiaoyu Chen, Ran Ran, Xiaofeng Cong, Xinyang Liu, Jian Wei You, Liang-Jian Deng
PDF
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao
PDF
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Alejandro Lozano, Min Woo Sun, James Burgess, Liangyu Chen, Jeffrey J. Nirschl, Jeffrey Gu, Ivan Lopez, Josiah Aklilu, Anita Rau, Austin Wolfgang Katzer, Yuhui Zhang, Collin Chiu, Xiaohan Wang, Alfred Seunghoon Song, Robert Tibshirani, Serena Yeung-Levy
PDF
BioX-CPath: Biologically-Driven Explainable Diagnostics for Multistain IHC Computational Pathology Amaya Gallagher-Syed, Henry Senior, Omnia Alwazzan, Elena Pontarini, Michele Bombardieri, Costantino Pitzalis, Myles J. Lewis, Michael R. Barnes, Luca Rossi, Gregory Slabaugh
PDF
BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence Xuewu Lin, Tianwei Lin, Lichao Huang, Hongyu Xie, Zhizhong Su
PDF
Birth and Death of a Rose Chen Geng, Yunzhi Zhang, Shangzhe Wu, Jiajun Wu
PDF
BizGen: Advancing Article-Level Visual Text Rendering for Infographics Generation Yuyang Peng, Shishi Xiao, Keming Wu, Qisheng Liao, Bohan Chen, Kevin Lin, Danqing Huang, Ji Li, Yuhui Yuan
PDF
Black Hole-Driven Identity Absorbing in Diffusion Models Muhammad Shaheryar, Jong Taek Lee, Soon Ki Jung
PDF
Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events Aditya Chinchure, Sahithya Ravi, Raymond Ng, Vered Shwartz, Boyang Li, Leonid Sigal
PDF
Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models Andreas Müller, Denis Lukovnikov, Jonas Thietke, Asja Fischer, Erwin Quiring
PDF
BLADE: Single-View Body Mesh Estimation Through Accurate Depth Estimation Shengze Wang, Jiefeng Li, Tianye Li, Ye Yuan, Henry Fuchs, Koki Nagano, Shalini De Mello, Michael Stengel
PDF
BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing Yunqi Gu, Ian Huang, Jihyeon Je, Guandao Yang, Leonidas Guibas
PDF
Blind Bitstream-Corrupted Video Recovery via Metadata-Guided Diffusion Model Shuyun Wang, Hu Zhang, Xin Shen, Dadong Wang, Xin Yu
PDF
BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations Weixi Feng, Chao Liu, Sifei Liu, William Yang Wang, Arash Vahdat, Weili Nie
PDF
BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers Hui Zhang, Tingwei Gao, Jie Shao, Zuxuan Wu
PDF
Blood Flow Speed Estimation with Optical Coherence Tomography Angiography Images Wensheng Cheng, Zhenghong Li, Jiaxiang Ren, Hyomin Jeong, Congwu Du, Yingtian Pan, Haibin Ling
PDF
BlueLM-V-3b: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices Xudong Lu, Yinghao Chen, Cheng Chen, Hui Tan, Boheng Chen, Yina Xie, Rui Hu, Guanxin Tan, Renshou Wu, Yan Hu, Yi Zeng, Lei Wu, Liuyang Bian, Zhaoxiong Wang, Long Liu, Yanzhou Yang, Han Xiao, Aojun Zhou, Yafei Wen, Xiaoxin Chen, Shuai Ren, Hongsheng Li
PDF
Blurred LiDAR for Sharper 3D: Robust Handheld 3D Scanning with Diffuse LiDAR and RGB Nikhil Behari, Aaron Young, Siddharth Somasundaram, Tzofi Klinghoffer, Akshat Dave, Ramesh Raskar
PDF
Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries Wei Xu, Charles James Wagner, Junjie Luo, Qi Guo
PDF
BOE-ViT: Boosting Orientation Estimation with Equivariance in Self-Supervised 3D Subtomogram Alignment Runmin Jiang, Jackson Daggett, Shriya Pingulkar, Yizhou Zhao, Priyanshu Dhingra, Daniel Brown, Qifeng Wu, Xiangrui Zeng, Xingjian Li, Min Xu
PDF
BOLT: Boost Large Vision-Language Model Without Training for Long-Form Video Understanding Shuming Liu, Chen Zhao, Tianqi Xu, Bernard Ghanem
PDF
Boltzmann Attention Sampling for Image Analysis with Small Objects Theodore Zhao, Sid Kiblawi, Naoto Usuyama, Ho Hin Lee, Sam Preston, Hoifung Poon, Mu Wei
PDF
Boost the Inference with Co-Training: A Depth-Guided Mutual Learning Framework for Semi-Supervised Medical Polyp Segmentation Yuxin Li, Zihao Zhu, Yuxiang Zhang, Yifan Chen, Zhibin Yu
PDF
Boost Your Human Image Generation Model via Direct Preference Optimization Sanghyeon Na, Yonggyu Kim, Hyunjoon Lee
PDF
Boosting Adversarial Transferability Through Augmentation in Hypothesis Space Yu Guo, Weiquan Liu, Qingshan Xu, Shijun Zheng, Shujun Huang, Yu Zang, Siqi Shen, Chenglu Wen, Cheng Wang
PDF
Boosting Domain Incremental Learning: Selecting the Optimal Parameters Is All You Need Qiang Wang, Xiang Song, Yuhang He, Jizhou Han, Chenhao Ding, Xinyuan Gao, Yihong Gong
PDF
Boosting Point-Supervised Temporal Action Localization Through Integrating Query Reformation and Optimal Transport Mengnan Liu, Le Wang, Sanping Zhou, Kun Xia, Xiaolong Sun, Gang Hua
PDF
Boosting the Dual-Stream Architecture in Ultra-High Resolution Segmentation with Resolution-Biased Uncertainty Estimation Rong Qin, Xingyu Liu, Jinglei Shi, Liang Lin, Jufeng Yang
PDF
BOOTPLACE: Bootstrapped Object Placement with Detection Transformers Hang Zhou, Xinxin Zuo, Rui Ma, Li Cheng
PDF
Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-Grained View-Invariant Video Representations Jungin Park, Jiyoung Lee, Kwanghoon Sohn
PDF
BooW-VTON: Boosting In-the-Wild Virtual Try-on via Mask-Free Pseudo Data Training Xuanpu Zhang, Dan Song, Pengxin Zhan, Tianyu Chang, Jianhao Zeng, Qingguo Chen, Weihua Luo, An-An Liu
PDF
Brain-Inspired Spiking Neural Networks for Energy-Efficient Object Detection Ziqi Li, Tao Gao, Yisheng An, Ting Chen, Jing Zhang, Yuanbo Wen, Mengkun Liu, Qianxi Zhang
PDF
Breaking the Low-Rank Dilemma of Linear Attention Qihang Fan, Huaibo Huang, Ran He
PDF
Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy Zesen Cheng, Hang Zhang, Kehan Li, Sicong Leng, Zhiqiang Hu, Fei Wu, Deli Zhao, Xin Li, Lidong Bing
PDF
BrepGiff: Lightweight Generation of Complex B-Rep with 3D GAT Diffusion Hao Guo, Xiaoshui Huang, Hao Jiacheng, Yunpeng Bai, Hongping Gan, Yilei Shi
PDF
Bridge Frame and Event: Common Spatiotemporal Fusion for High-Dynamic Scene Optical Flow Hanyu Zhou, Haonan Wang, Haoyue Liu, Yuxing Duan, Yi Chang, Luxin Yan
PDF
Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer Ziyi Liu, Yangcen Liu
PDF
Bridging Gait Recognition and Large Language Models Sequence Modeling Shaopeng Yang, Jilong Wang, Saihui Hou, Xu Liu, Chunshui Cao, Liang Wang, Yongzhen Huang
PDF
Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models Xin Zhang, Yanzhao Zhang, Wen Xie, Mingxin Li, Ziqi Dai, Dingkun Long, Pengjun Xie, Meishan Zhang, Wenjie Li, Min Zhang
PDF
Bridging past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning Bozhou Zhang, Nan Song, Xin Jin, Li Zhang
PDF
Bridging the Gap Between Gaussian Diffusion Models and Universal Quantization for Image Compression Lucas Relic, Roberto Azevedo, Yang Zhang, Markus Gross, Christopher Schroers
PDF
Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior Haitao Wu, Qing Li, Changqing Zhang, Zhen He, Xiaomin Ying
PDF
Bridging Viewpoint Gaps: Geometric Reasoning Boosts Semantic Correspondence Qiyang Qian, Hansheng Chen, Masayoshi Tomizuka, Kurt Keutzer, Qianqian Wang, Chenfeng Xu
PDF
Bringing CLIP to the Clinic: Dynamic Soft Labels and Negation-Aware Learning for Medical Analysis Hanbin Ko, Chang-Min Park
PDF
Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors Zhengfei Kuang, Tianyuan Zhang, Kai Zhang, Hao Tan, Sai Bi, Yiwei Hu, Zexiang Xu, Milos Hasan, Gordon Wetzstein, Fujun Luan
PDF
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs Zeyi Huang, Yuyang Ji, Xiaofang Wang, Nikhil Mehta, Tong Xiao, Donghyun Lee, Sigmund Vanvalkenburgh, Shengxin Zha, Bolin Lai, Licheng Yu, Ning Zhang, Yong Jae Lee, Miao Liu
PDF
Building Vision Models upon Heat Conduction Zhaozhi Wang, Yue Liu, Yunjie Tian, Yunfan Liu, Yaowei Wang, Qixiang Ye
PDF
BWFormer: Building Wireframe Reconstruction from Airborne LiDAR Point Cloud with Transformer Yuzhou Liu, Lingjie Zhu, Hanqiao Ye, Shangfeng Huang, Xiang Gao, Xianwei Zheng, Shuhan Shen
PDF
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-Free Way Jiazi Bu, Pengyang Ling, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang
PDF
CacheQuant: Comprehensively Accelerated Diffusion Models Xuewen Liu, Zhikai Li, Qingyi Gu
PDF
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation Jiahao Li, Weijian Ma, Xueyang Li, Yunzhong Lou, Guichun Zhou, Xiangdong Zhou
PDF
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images Cheng Chen, Jiacheng Wei, Tianrun Chen, Chi Zhang, Xiaofeng Yang, Shangzhan Zhang, Bingchen Yang, Chuan-Sheng Foo, Guosheng Lin, Qixing Huang, Fayao Liu
PDF
CADDreamer: CAD Object Generation from Single-View Images Yuan Li, Cheng Lin, Yuan Liu, Xiaoxiao Long, Chenxu Zhang, Ningna Wang, Xin Li, Wenping Wang, Xiaohu Guo
PDF
CADRef: Robust Out-of-Distribution Detection via Class-Aware Decoupled Relative Feature Leveraging Zhiwei Ling, Yachen Chang, Hailiang Zhao, Xinkui Zhao, Kingsum Chow, Shuiguang Deng
PDF
Calibrated Multi-Preference Optimization for Aligning Diffusion Models Kyungmin Lee, Xiahong Li, Qifei Wang, Junfeng He, Junjie Ke, Ming-Hsuan Yang, Irfan Essa, Jinwoo Shin, Feng Yang, Yinxiao Li
PDF
CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models Kiet A. Nguyen, Adheesh Juvekar, Tianjiao Yu, Muntasir Wahed, Ismini Lourentzou
PDF
Camera Resection from Known Line Pencils and a Radially Distorted Scanline Juan C. Dibene, Enrique Dunn
PDF
CamFreeDiff: Camera-Free Image to Panorama Generation with Diffusion Model Xiaoding Yuan, Shitao Tang, Kejie Li, Peng Wang
PDF
Camouflage Anything: Learning to Hide Using Controlled Out-Painting and Representation Engineering Biplab Das, Viswanath Gopalakrishnan
PDF
CamPoint: Boosting Point Cloud Segmentation with Virtual Camera Jianhui Zhang, Yizhi Luo, Zicheng Zhang, Xuecheng Nie, Bonan Li
PDF
CaMuViD: Calibration-Free Multi-View Detection Amir Etefaghi Daryani, M. Usman Maqbool Bhutta, Byron Hernandez, Henry Medeiros
PDF
Can Generative Video Models Help Pose Estimation? Ruojin Cai, Jason Y. Zhang, Philipp Henzler, Zhengqi Li, Noah Snavely, Ricardo Martin-Brualla
PDF
Can Large Vision-Language Models Correct Semantic Grounding Errors by Themselves? Yuan-Hong Liao, Rafid Mahmood, Sanja Fidler, David Acuna
PDF
Can Machines Understand Composition? Dataset and Benchmark for Photographic Image Composition Embedding and Understanding Zhaoran Zhao, Peng Lu, Anran Zhang, Peipei Li, Xia Li, Xuannan Liu, Yang Hu, Shiyi Chen, Liwei Wang, Wenhao Guo
PDF
Can Text-to-Video Generation Help Video-Language Alignment? Luca Zanella, Massimiliano Mancini, Willi Menapace, Sergey Tulyakov, Yiming Wang, Elisa Ricci
PDF
Can't Slow Me Down: Learning Robust and Hardware-Adaptive Object Detectors Against Latency Attacks for Edge Devices Tianyi Wang, Zichen Wang, Cong Wang, Yuanchao Shu, Ruilong Deng, Peng Cheng, Jiming Chen
PDF
CAP-Net: A Unified Network for 6d Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image Jingshun Huang, Haitao Lin, Tianyu Wang, Yanwei Fu, Xiangyang Xue, Yi Zhu
PDF
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models Felix Taubner, Ruihang Zhang, Mathieu Tuli, David B. Lindell
PDF
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction Yuan Zhou, Qingshan Xu, Jiequan Cui, Junbao Zhou, Jing Zhang, Richang Hong, Hanwang Zhang
PDF
CaricatureBooth: Data-Free Interactive Caricature Generation in a Photo Booth Zhiyu Qu, Yunqi Miao, Zhensong Zhang, Jifei Song, Jiankang Deng, Yi-Zhe Song
PDF
CARL: A Framework for Equivariant Image Registration Hastings Greer, Lin Tian, François-Xavier Vialard, Roland Kwitt, Raul San Jose Estepar, Marc Niethammer
PDF
CarPlanner: Consistent Auto-Regressive Trajectory Planning for Large-Scale Reinforcement Learning in Autonomous Driving Dongkun Zhang, Jiaming Liang, Ke Guo, Sha Lu, Qi Wang, Rong Xiong, Zhenwei Miao, Yue Wang
PDF
CASAGPT: Cuboid Arrangement and Scene Assembly for Interior Design Weitao Feng, Hang Zhou, Jing Liao, Li Cheng, Wenbo Zhou
PDF
CASP: Compression of Large Multimodal Models Based on Attention Sparsity Mohsen Gholami, Mohammad Akbari, Kevin Cannons, Yong Zhang
PDF
CASP: Consistency-Aware Audio-Induced Saliency Prediction Model for Omnidirectional Video Zhaolin Wan, Han Qin, Zhiyang Li, Xiaopeng Fan, Wangmeng Zuo, Debin Zhao
PDF
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Rundi Wu, Ruiqi Gao, Ben Poole, Alex Trevithick, Changxi Zheng, Jonathan T. Barron, Aleksander Holynski
PDF
CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution Xin Liu, Jie Liu, Jie Tang, Gangshan Wu
PDF
Category-Agnostic Neural Object Rigging Guangzhao He, Chen Geng, Shangzhe Wu, Jiajun Wu
PDF
Causal Composition Diffusion Model for Closed-Loop Traffic Generation Haohong Lin, Xin Huang, Tung Phan, David Hayden, Huan Zhang, Ding Zhao, Siddhartha Srinivasa, Eric Wolff, Hongge Chen
PDF
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment Edson Araujo, Andrew Rouditchenko, Yuan Gong, Saurabhchand Bhati, Samuel Thomas, Brian Kingsbury, Leonid Karlinsky, Rogerio Feris, James R. Glass, Hilde Kuehne
PDF
CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval Likai Tian, Jian Zhao, Zechao Hu, Zhengwei Yang, Hao Li, Lei Jin, Zheng Wang, Xuelong Li
PDF
CDI: Copyrighted Data Identification in Diffusion Models Jan Dubiński, Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic
PDF
Certified Human Trajectory Prediction Mohammadhossein Bahari, Saeed Saadatnejad, Amirhossein Askari Farsangi, Seyed-Mohsen Moosavi-Dezfooli, Alexandre Alahi
PDF
CGMatch: A Different Perspective of Semi-Supervised Learning Bo Cheng, Jueqing Lu, Yuan Tian, Haifeng Zhao, Yi Chang, Lan Du
PDF
CH3Depth: Efficient and Flexible Depth Foundation Model with Flow Matching Jiaqi Li, Yiran Wang, Jinghong Zheng, Junrui Zhang, Liao Shen, Tianqi Liu, Zhiguo Cao
PDF
Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks Peng Xie, Yequan Bie, Jianda Mao, Yangqiu Song, Yang Wang, Hao Chen, Kani Chen
PDF
Chain of Semantics Programming in 3D Gaussian Splatting Representation for 3D Vision Grounding Jiaxin Shi, Mingyue Xiang, Hao Sun, Yixuan Huang, Zhi Weng
PDF
ChainHOI: Joint-Based Kinematic Chain Modeling for Human-Object Interaction Generation Ling-An Zeng, Guohong Huang, Yi-Lin Wei, Shengbo Gu, Yu-Ming Tang, Jingke Meng, Wei-Shi Zheng
PDF
Change3D: Revisiting Change Detection and Captioning from a Video Modeling Perspective Duowang Zhu, Xiaohu Huang, Haiyan Huang, Hao Zhou, Zhenfeng Shao
PDF
Channel Consistency Prior and Self-Reconstruction Strategy Based Unsupervised Image Deraining Guanglu Dong, Tianheng Zheng, Yuanzhouhan Cao, Linbo Qing, Chao Ren
PDF
Channel-Wise Noise Scheduled Diffusion for Inverse Rendering in Indoor Scenes JunYong Choi, Min-cheol Sagong, SeokYeong Lee, Seung-Won Jung, Ig-Jae Kim, Junghyun Cho
PDF
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs Lucas Ventura, Antoine Yang, Cordelia Schmid, Gül Varol
PDF
Charm: The Missing Piece in ViT Fine-Tuning for Image Aesthetic Assessment Fatemeh Behrad, Tinne Tuytelaars, Johan Wagemans
PDF
Chat-Based Person Retrieval via Dialogue-Refined Cross-Modal Alignment Yang Bai, Yucheng Ji, Min Cao, Jinqiao Wang, Mang Ye
PDF
Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models Ronghuan Wu, Wanchao Su, Jing Liao
PDF
ChatGarment: Garment Estimation, Generation and Editing via Large Language Models Siyuan Bian, Chenghao Xu, Yuliang Xiu, Artur Grigorev, Zhen Liu, Cewu Lu, Michael J. Black, Yao Feng
PDF
ChatGen: Automatic Text-to-Image Generation from FreeStyle Chatting Chengyou Jia, Changliang Xia, Zhuohang Dang, Weijia Wu, Hangwei Qian, Minnan Luo
PDF
ChatHuman: Chatting About 3D Humans with Tools Jing Lin, Yao Feng, Weiyang Liu, Michael J. Black
PDF
Cheb-GR: Rethinking K-Nearest Neighbor Search in Re-Ranking for Person Re-Identification Jinxi Yang, He Li, Bo Du, Mang Ye
PDF
Chebyshev Attention Depth Permutation Texture Network with Latent Texture Attribute Loss Ravishankar Evani, Deepu Rajan, Shangbo Mao
PDF
CheckManual: A New Challenge and Benchmark for Manual-Based Appliance Manipulation Yuxing Long, Jiyao Zhang, Mingjie Pan, Tianshu Wu, Taewhan Kim, Hao Dong
PDF
CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-Rays Through Mobile Devices Mariamma Antony, Rajiv Porana, Sahil M Lathiya, Siva Teja Kakileti, Chiranjib Bhattacharyya
PDF
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning Yang Yue, Yulin Wang, Chenxin Tao, Pan Liu, Shiji Song, Gao Huang
PDF
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools Chinedu Innocent Nwoye, Kareem Elgohary, Anvita Srinivas, Fauzan Zaid, Joël L. Lavanchy, Nicolas Padoy
PDF
Circumventing Shortcuts in Audio-Visual Deepfake Detection Datasets with Unsupervised Learning Stefan Smeu, Dragos-Alexandru Boldisor, Dan Oneata, Elisabeta Oneata
PDF
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos Xinhao Liu, Jintong Li, Yicheng Jiang, Niranjan Sujay, Zhicheng Yang, Juexiao Zhang, John Abanes, Jing Zhang, Chen Feng
PDF
CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning Jiangpeng He, Zhihao Duan, Fengqing Zhu
PDF
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering Tianyu Huai, Jie Zhou, Xingjiao Wu, Qin Chen, Qingchun Bai, Ze Zhou, Liang He
PDF
Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable Xin Jin, Simon Niklaus, Zhoutong Zhang, Zhihao Xia, Chunle Guo, Yuting Yang, Jiawen Chen, Chongyi Li
PDF
Classifier-Free Guidance Inside the Attraction Basin May Cause Memorization Anubhav Jain, Yuya Kobayashi, Takashi Shibuya, Yuhta Takida, Nasir Memon, Julian Togelius, Yuki Mitsufuji
PDF
Classifier-Guided CLIP Distillation for Unsupervised Multi-Label Classification Dongseob Kim, Hyunjung Shim
PDF
Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers Quentin Guimard, Moreno D'Incà, Massimiliano Mancini, Elisa Ricci
PDF
CleanDIFT: Diffusion Features Without Noise Nick Stracke, Stefan Andreas Baumann, Kolja Bauer, Frank Fundel, Björn Ommer
PDF
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models Hao Yin, Guangzong Si, Zilei Wang
PDF
ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate Ming Yan, Xincheng Lin, Yuhua Luo, Shuqi Fan, Yudi Dai, Qixin Zhong, Lincai Zhong, Yuexin Ma, Lan Xu, Chenglu Wen, Siqi Shen, Cheng Wang
PDF
CLIP Is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval Without OCR Xugong Qin, Peng Zhang, Jun Jie Ou Yang, Gangyan Zeng, Yubo Li, Yuanyuan Wang, Wanqian Zhang, Pengwen Dai
PDF
CLIP Is Strong Enough to Fight Back: Test-Time Counterattacks Towards Zero-Shot Adversarial Robustness of CLIP Songlong Xing, Zhengyu Zhao, Nicu Sebe
PDF
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation Reza Abbasi, Ali Nazari, Aminreza Sefid, Mohammadali Banayeeanzade, Mohammad Hossein Rohban, Mahdieh Soleymani Baghshah
PDF
CLIP-Driven Coarse-to-Fine Semantic Guidance for Fine-Grained Open-Set Semi-Supervised Learning Xiaokun Li, Yaping Huang, Qingji Guan
PDF
CLOC: Contrastive Learning for Ordinal Classification with Multi-Margin N-Pair Loss Dileepa Pitawela, Gustavo Carneiro, Hsiang-Ting Chen
PDF
Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models Zhejun Zhang, Peter Karkus, Maximilian Igl, Wenhao Ding, Yuxiao Chen, Boris Ivanovic, Marco Pavone
PDF
Closest Neighbors Are Harmful for Lightweight Masked Auto-Encoders Jian Meng, Ahmed Hasssan, Li Yang, Deliang Fan, Jinwoo Shin, Jae-sun Seo
PDF
CMMLoc: Advancing Text-to-PointCloud Localization with Cauchy-Mixture-Model Based Framework Yanlong Xu, Haoxuan Qu, Jun Liu, Wenxiao Zhang, Xun Yang
PDF
Co-Op: Correspondence-Based Novel Object Pose Estimation Sungphill Moon, Hyeontae Son, Dongcheol Hur, Sangwook Kim
PDF
Co-Speech Gesture Video Generation with Implicit Motion-Audio Entanglement Xinjie Li, Ziyi Chen, Xinlu Yu, Iek-Heng Chu, Peng Chang, Jing Xiao
PDF
CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI Siyuan Cheng, Lingjuan Lyu, Zhenting Wang, Xiangyu Zhang, Vikash Sehwag
PDF
CoA: Towards Real Image Dehazing via Compression-and-Adaptation Long Ma, Yuxin Feng, Yan Zhang, Jinyuan Liu, Weimin Wang, Guang-Yong Chen, Chengpei Xu, Zhuo Su
PDF
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection Jinqi Xiao, Shen Sang, Tiancheng Zhi, Jing Liu, Qing Yan, Linjie Luo, Bo Yuan
PDF
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model Benlin Liu, Yuhao Dong, Yiqin Wang, Zixian Ma, Yansong Tang, Luming Tang, Yongming Rao, Wei-Chiu Ma, Ranjay Krishna
PDF
COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting Jiaxin Zhang, Junjun Jiang, Youyu Chen, Kui Jiang, Xianming Liu
PDF
COBRA: COmBinatorial Retrieval Augmentation for Few-Shot Adaptation Arnav M. Das, Gantavya Bhatt, Lilly Kumari, Sahil Verma, Jeff Bilmes
PDF
CocoER: Aligning Multi-Level Feature by Competition and Coordination for Emotion Recognition Xuli Shen, Hua Cai, Weilin Shen, Qing Xu, Dingding Yu, Weifeng Ge, Xiangyang Xue
PDF
CoCoGaussian: Leveraging Circle of Confusion for Gaussian Splatting from Defocused Images Jungho Lee, Suhwan Cho, Taeoh Kim, Ho-Deok Jang, Minhyeok Lee, Geonho Cha, Dongyoon Wee, Dogyoon Lee, Sangyoun Lee
PDF
Code-as-Monitor: Constraint-Aware Visual Programming for Reactive and Proactive Robotic Failure Detection Enshen Zhou, Qi Su, Cheng Chi, Zhizheng Zhang, Zhongyuan Wang, Tiejun Huang, Lu Sheng, He Wang
PDF
CoE: Chain-of-Explanation via Automatic Visual Concept Circuit Description and Polysemanticity Quantification Wenlong Yu, Qilong Wang, Chuang Liu, Dong Li, Qinghua Hu
PDF
Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models Zichen Miao, Wei Chen, Qiang Qiu
PDF
Coherent 3D Portrait Video Reconstruction via Triplane Fusion Shengze Wang, Xueting Li, Chao Liu, Matthew Chan, Michael Stengel, Henry Fuchs, Shalini De Mello, Koki Nagano
PDF
ColabSfM: Collaborative Structure-from-Motion by Point Cloud Registration Johan Edstedt, André Mateus, Alberto Jaenal
PDF
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient Zigeng Chen, Xinyin Ma, Gongfan Fang, Xinchao Wang
PDF
Collaborative Tree Search for Enhancing Embodied Multi-Agent Collaboration Lizheng Zu, Lin Lin, Song Fu, Na Zhao, Pan Zhou
PDF
CoLLM: A Large Language Model for Composed Image Retrieval Chuong Huynh, Jinyu Yang, Ashish Tawari, Mubarak Shah, Son Tran, Raffay Hamid, Trishul Chilimbi, Abhinav Shrivastava
PDF
Color Alignment in Diffusion Ka Chun Shum, Binh-Son Hua, Duc Thanh Nguyen, Sai-Kit Yeung
PDF
CoMapGS: Covisibility mAP-Based Gaussian Splatting for Sparse Novel View Synthesis Youngkyoon Jang, Eduardo Pérez-Pellitero
PDF
CoMatcher: Multi-View Collaborative Feature Matching Jintao Zhang, Zimin Xia, Mingyue Dong, Shuhan Shen, Linwei Yue, Xianwei Zheng
PDF
CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation Kai Fang, Anqi Zhang, Guangyu Gao, Jianbo Jiao, Chi Harold Liu, Yunchao Wei
PDF
ComfyBench: Benchmarking LLM-Based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems Xiangyuan Xue, Zeyu Lu, Di Huang, Zidong Wang, Wanli Ouyang, Lei Bai
PDF
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation Wei Chen, Lin Li, Yongqi Yang, Bin Wen, Fan Yang, Tingting Gao, Yu Wu, Long Chen
PDF
Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space Leonhard Sommer, Olaf Dünkel, Christian Theobalt, Adam Kortylewski
PDF
Commonsense Video Question Answering Through Video-Grounded Entailment Tree Reasoning Huabin Liu, Filip Ilievski, Cees G. M. Snoek
PDF
Community Forensics: Using Thousands of Generators to Train Fake Image Detectors Jeongsoo Park, Andrew Owens
PDF
Compass Control: Multi Object Orientation Control for Text-to-Image Generation Rishubh Parihar, Vaibhav Agrawal, Sachidanand Vs, Venkatesh Babu Radhakrishnan
PDF
CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians Chongjian Ge, Chenfeng Xu, Yuanfeng Ji, Chensheng Peng, Masayoshi Tomizuka, Ping Luo, Mingyu Ding, Varun Jampani, Wei Zhan
PDF
Complementary Advantages: Exploiting Cross-Field Frequency Correlation for NIR-Assisted Image Denoising Yuchen Wang, Hongyuan Wang, Lizhi Wang, Xin Wang, Lin Zhu, Wanxuan Lu, Hua Huang
PDF
Completion as Enhancement: A Degradation-Aware Selective Image Guided Network for Depth Completion Zhiqiang Yan, Zhengxue Wang, Kun Wang, Jun Li, Jian Yang
PDF
Complexity Experts Are Task-Discriminative Learners for Any Image Restoration Eduard Zamfir, Zongwei Wu, Nancy Mehta, Yuedong Tan, Danda Pani Paudel, Yulun Zhang, Radu Timofte
PDF
Composing Parts for Expressive Object Generation Harsh Rangwani, Aishwarya Agarwal, Kuldeep Kulkarni, R. Venkatesh Babu, Srikrishna Karanam
PDF
Compositional Caching for Training-Free Open-Vocabulary Attribute Detection Marco Garosi, Alessandro Conti, Gaowen Liu, Elisa Ricci, Massimiliano Mancini
PDF
Compositional Targeted Multi-Label Universal Perturbations Hassan Mahmood, Ehsan Elhamifar
PDF
Comprehensive Information Bottleneck for Unveiling Universal Attribution to Interpret Vision Transformers Jung-Ho Hong, Ho-Joong Kim, Kyu-Sung Jeon, Seong-Whan Lee
PDF
Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization Junying Wang, Jingyuan Liu, Xin Sun, Krishna Kumar Singh, Zhixin Shu, He Zhang, Jimei Yang, Nanxuan Zhao, Tuanfeng Y. Wang, Simon S. Chen, Ulrich Neumann, Jae Shin Yoon
PDF
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices Hao Yu, Tangyu Jiang, Shuning Jia, Shannan Yan, Shunning Liu, Haolong Qian, Guanghao Li, Shuting Dong, Chun Yuan
PDF
Concept Lancet: Image Editing with Compositional Representation Transplant Jinqi Luo, Tianjiao Ding, Kwan Ho Ryan Chan, Hancheng Min, Chris Callison-Burch, Rene Vidal
PDF
Concept Replacer: Replacing Sensitive Concepts in Diffusion Models via Precision Localization Lingyun Zhang, Yu Xie, Yanwei Fu, Ping Chen
PDF
ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation Zirun Guo, Tao Jin
PDF
Condensing Action Segmentation Datasets via Generative Network Inversion Guodong Ding, Rongyu Chen, Angela Yao
PDF
Conditional Balance: Improving Multi-Conditioning Trade-Offs in Image Generation Nadav Z. Cohen, Oron Nir, Ariel Shamir
PDF
Conformal Prediction and MLLM Aided Uncertainty Quantification in Scene Graph Generation Sayak Nag, Udita Ghosh, Calvin-Khang Ta, Sarosij Bose, Jiachen Li, Amit K. Roy-Chowdhury
PDF
Conformal Prediction for Zero-Shot Models Julio Silva-Rodríguez, Ismail Ben Ayed, Jose Dolz
PDF
Conical Visual Concentration for Efficient Large Vision-Language Models Long Xing, Qidong Huang, Xiaoyi Dong, Jiajie Lu, Pan Zhang, Yuhang Zang, Yuhang Cao, Conghui He, Jiaqi Wang, Feng Wu, Dahua Lin
PDF
ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer Jiayi Gao, Zijin Yin, Changcheng Hua, Yuxin Peng, Kongming Liang, Zhanyu Ma, Jun Guo, Yang Liu
PDF
Consistency Posterior Sampling for Diverse Image Synthesis Vishal Purohit, Matthew Repasky, Jianfeng Lu, Qiang Qiu, Yao Xie, Xiuyuan Cheng
PDF
Consistency-Aware Self-Training for Iterative-Based Stereo Matching Jingyi Zhou, Peng Ye, Haoyu Zhang, Jiakang Yuan, Rao Qiang, Liu YangChenXu, Wu Cailin, Feng Xu, Tao Chen
PDF
Consistent and Controllable Image Animation with Motion Diffusion Models Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Tien-Tsin Wong, Yuan-Fang Li, Cunjian Chen
PDF
Consistent Normal Orientation for 3D Point Clouds via Least Squares on Delaunay Graph Rao Fu, Jianmin Zheng, Liang Yu
PDF
Context-Aware Multimodal Pretraining Karsten Roth, Zeynep Akata, Dima Damen, Ivana Balazevic, Olivier J. Henaff
PDF
ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval Eric Xing, Pranavi Kolouju, Robert Pless, Abby Stylianou, Nathan Jacobs
PDF
Context-Enhanced Memory-Refined Transformer for Online Action Detection Zhanzhong Pang, Fadime Sener, Angela Yao
PDF
Contextual AD Narration with Interleaved Multimodal Sequence Hanlin Wang, Zhan Tong, Kecheng Zheng, Yujun Shen, Limin Wang
PDF
Continual SFT Matches Multimodal RLHF with Negative Supervision Ke Zhu, Yu Wang, Yanpeng Sun, Qiang Chen, Jiangjiang Liu, Gang Zhang, Jingdong Wang
PDF
Continuous 3D Perception Model with Persistent State Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A. Efros, Angjoo Kanazawa
PDF
Continuous Adverse Weather Removal via Degradation-Aware Distillation Xin Lu, Jie Xiao, Yurui Zhu, Xueyang Fu
PDF
Continuous Locomotive Crowd Behavior Generation Inhwan Bae, Junoh Lee, Hae-Gon Jeon
PDF
Continuous Space-Time Video Resampling with Invertible Motion Steganography Yuantong Zhang, Zhenzhong Chen
PDF
Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions Stefan Andreas Baumann, Felix Krause, Michael Neumayr, Nick Stracke, Melvin Sevi, Vincent Tao Hu, Björn Ommer
PDF
ControlFace: Harnessing Facial Parametric Control for Face Rigging Wooseok Jang, Youngjun Hong, Geonho Cha, Seungryong Kim
PDF
Controllable Human Image Generation with Personalized Multi-Garments Yisol Choi, Sangkyung Kwak, Sihyun Yu, Hyungwon Choi, Jinwoo Shin
PDF
Convex Combination Star Shape Prior for Data-Driven Image Semantic Segmentation Xinyu Zhao, Jun Xie, Shengzhe Chen, Jun Liu
PDF
Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World Bangyan Liao, Zhenjun Zhao, Haoang Li, Yi Zhou, Yingping Zeng, Hao Li, Peidong Liu
PDF
CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement Yun Liu, Chengwen Zhang, Ruofan Xing, Bingda Tang, Bowen Yang, Li Yi
PDF
CorrBEV: Multi-View 3D Object Detection by Correlation Learning with Multi-Modal Prototypes Ziteng Xue, Mingzhe Guo, Heng Fan, Shihui Zhang, Zhipeng Zhang
PDF
Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection Farzad Beizaee, Gregory A. Lodygensky, Christian Desrosiers, Jose Dolz
PDF
Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning Lei-Lei Ma, Shuo Xu, Ming-Kun Xie, Lei Wang, Dengdi Sun, Haifeng Zhao
PDF
CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization Junhao Xu, Yanan Zhang, Zhi Cai, Di Huang
PDF
CoSER: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation Bonan Li, Zicheng Zhang, Xingyi Yang, Xinchao Wang
PDF
COSMIC: Clique-Oriented Semantic Multi-Space Integration for Robust CLIP Test-Time Adaptation Fanding Huang, Jingyan Jiang, Qinting Jiang, Hebei Li, Faisal Nadeem Khan, Zhi Wang
PDF
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-Training Sanghwan Kim, Rui Xiao, Mariana-Iuliana Georgescu, Stephan Alaniz, Zeynep Akata
PDF
CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models Yiqi Zhu, Ziyue Wang, Can Zhang, Peng Li, Yang Liu
PDF
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn, Ankur Handa, Tsung-Yi Lin, Gordon Wetzstein, Ming-Yu Liu, Donglai Xiang
PDF
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model Ziyu Yao, Xuxin Cheng, Zhiqi Huang, Lei Li
PDF
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models Under Distribution Shifts Jiansheng Li, Xingxuan Zhang, Hao Zou, Yige Guo, Renzhe Xu, Yilong Liu, Chuzhao Zhu, Yue He, Peng Cui
PDF
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology Yuxuan Sun, Yixuan Si, Chenglu Zhu, Xuan Gong, Kai Zhang, Pingyi Chen, Ye Zhang, Zhongyi Shui, Tao Lin, Lin Yang
PDF
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation Henghui Du, Guangyao Li, Chang Zhou, Chunjie Zhang, Alan Zhao, Di Hu
PDF
CraftsMan3D: High-Fidelity Mesh Generation with 3D Native Diffusion and Interactive Geometry Refiner Weiyu Li, Jiarui Liu, Hongyu Yan, Rui Chen, Yixun Liang, Xuelin Chen, Ping Tan, Xiaoxiao Long
PDF
Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-Constrained Gaussian Splatting Hanxi Liu, Yifang Men, Zhouhui Lian
PDF
CRISP: Object Pose and Shape Estimation with Test-Time Adaptation Jingnan Shi, Rajat Talak, Harry Zhang, David Jin, Luca Carlone
PDF
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning Di Zhang, Jingdi Lei, Junxian Li, Xunzhi Wang, Yujie Liu, Zonglin Yang, Jiatong Li, Weida Wang, Suorong Yang, Jianbo Wu, Peng Ye, Wanli Ouyang, Dongzhan Zhou
PDF
CroCoDL: Cross-Device Collaborative Dataset for Localization Hermann Blum, Alessandro Mercurio, Joshua O'Reilly, Tim Engelbracht, Mihai Dusmanu, Marc Pollefeys, Zuria Bauer
PDF
Cropper: Vision-Language Model for Image Cropping Through In-Context Learning Seung Hyun Lee, Jijun Jiang, Yiran Xu, Zhuofang Li, Junjie Ke, Yinxiao Li, Junfeng He, Steven Hickson, Katie Datsenko, Sangpil Kim, Ming-Hsuan Yang, Irfan Essa, Feng Yang
PDF
Cross-Modal 3D Representation with Multi-View Images and Point Clouds Ziyang Zhou, Pinghui Wang, Zi Liang, Haitao Bai, Ruofei Zhang
PDF
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding Jinlong Li, Cristiano Saltori, Fabio Poiesi, Nicu Sebe
PDF
Cross-Modal Causal Relation Alignment for Video Question Grounding Weixing Chen, Yang Liu, Binglin Chen, Jiandong Su, Yongsen Zheng, Liang Lin
PDF
Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D Motion Saad Lahlali, Sandra Kara, Hejer Ammar, Florian Chabot, Nicolas Granger, Hervé Le Borgne, Quoc-Cuong Pham
PDF
Cross-Modal Information Flow in Multimodal Large Language Models Zhi Zhang, Srishti Yadav, Fengze Han, Ekaterina Shutova
PDF
Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images Jie Mei, Chenyu Lin, Yu Qiu, Yaonan Wang, Hui Zhang, Ziyang Wang, Dong Dai
PDF
Cross-Rejective Open-Set SAR Image Registration Shasha Mao, Shiming Lu, Zhaolong Du, Licheng Jiao, Shuiping Gou, Luntian Mou, Xuequan Lu, Lin Xiong, Yimeng Zhang
PDF
Cross-View Completion Models Are Zero-Shot Correspondence Estimators Honggyu An, Jin Hyeon Kim, Seonghoon Park, Jaewoo Jung, Jisang Han, Sunghwan Hong, Seungryong Kim
PDF
CrossOver: 3D Scene Cross-Modal Alignment Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys, Daniel Barath, Iro Armeni
PDF
CrossSDF: 3D Reconstruction of Thin Structures from Cross-Sections Thomas Walker, Salvatore Esposito, Daniel Rebain, Amir Vaxman, Arno Onken, Changjian Li, Oisin Mac Aodha
PDF
CryptoFace: End-to-End Encrypted Face Recognition Wei Ao, Vishnu Naresh Boddeti
PDF
CSC-PA: Cross-Image Semantic Correlation via Prototype Attentions for Single-Network Semi-Supervised Breast Tumor Segmentation Zhenhui Ding, Guilian Chen, Qin Zhang, Huisi Wu, Jing Qin
PDF
CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion Kai He, Chin-Hsuan Wu, Igor Gilitschenski
PDF
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning Aniket Didolkar, Andrii Zadaianchuk, Rabiul Awal, Maximilian Seitzer, Efstratios Gavves, Aishwarya Agrawal
PDF
Cubify Anything: Scaling Indoor 3D Object Detection Justin Lazarow, David Griffiths, Gefen Kohavi, Francisco Crespo, Afshin Dehghan
PDF
Curriculum Coarse-to-Fine Selection for High-IPC Dataset Distillation Yanda Chen, Gongwei Chen, Miao Zhang, Weili Guan, Liqiang Nie
PDF
Curriculum Direct Preference Optimization for Diffusion and Consistency Models Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Nicu Sebe, Mubarak Shah
PDF
CustAny: Customizing Anything from a Single Example Lingjie Kong, Kai Wu, Chengming Xu, Xiaobin Hu, Wenhui Han, Jinlong Peng, Donghao Luo, Mengtian Li, Jiangning Zhang, Chengjie Wang, Yanwei Fu
PDF
Customized Condition Controllable Generation for Video Soundtrack Fan Qi, Kunsheng Ma, Changsheng Xu
PDF
CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation Jungsoo Lee, Debasmit Das, Munawar Hayat, Sungha Choi, Kyuwoong Hwang, Fatih Porikli
PDF
CXPMRG-Bench: Pre-Training and Benchmarking for X-Ray Medical Report Generation on CheXpert Plus Dataset Xiao Wang, Fuling Wang, Yuehang Li, Qingchuan Ma, Shiao Wang, Bo Jiang, Jin Tang
PDF
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation Weinan Jia, Mengqi Huang, Nan Chen, Lei Zhang, Zhendong Mao
PDF
D^3-Human: Dynamic Disentangled Digital Human from Monocular Video Honghu Chen, Bo Peng, Yunfan Tao, Juyong Zhang
PDF
D^3: Scaling up Deepfake Detection by Learning from Discrepancy Yongqi Yang, Zhihao Qian, Ye Zhu, Olga Russakovsky, Yu Wu
PDF
D^3CTTA: Domain-Dependent Decorrelation for Continual Test-Time Adaption of 3D LiDAR Segmentation Jichun Zhao, Haiyong Jiang, Haoxuan Song, Jun Xiao, Dong Gong
PDF
D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-Based Affective Recognition. Haoran Wang, Xinji Mai, Zeng Tao, Xuan Tong, Junxiong Lin, Yan Wang, Jiawen Yu, Shaoqi Yan, Ziheng Zhou, Wenqiang Zhang
PDF
DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers Li Ren, Chen Chen, Liqiang Wang, Kien Hua
PDF
DaCapo: Score Distillation as Stacked Bridge for Fast and High-Quality 3D Editing Yufei Huang, Bangyan Liao, Yuqi Hu, Haitao Lin, Lirong Wu, Siyuan Li, Cheng Tan, Zicheng Liu, Yunfan Liu, Zelin Zang, Chang Yu, Zhen Lei
PDF
DAGSM: Disentangled Avatar Generation with GS-Enhanced Mesh Jingyu Zhuang, Di Kang, Linchao Bao, Liang Lin, Guanbin Li
PDF
DAMM-Diffusion: Learning Divergence-Aware Multi-Modal Diffusion Model for Nanoparticles Distribution Prediction Junjie Zhou, Shouju Wang, Yuxia Tang, Qi Zhu, Daoqiang Zhang, Wei Shao
PDF
DarkIR: Robust Low-Light Image Restoration Daniel Feijoo, Juan C. Benito, Alvaro Garcia, Marcos V. Conde
PDF
DART: Disease-Aware Image-Text Alignment and Self-Correcting Re-Alignment for Trustworthy Radiology Report Generation Sang-Jun Park, Keun-Soo Heo, Dong-Hee Shin, Young-Han Son, Ji-Hye Oh, Tae-Eui Kam
PDF
DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds Youyu Chen, Junjun Jiang, Kui Jiang, Xiao Tang, Zhihao Li, Xianming Liu, Yinyu Nie
PDF
Data Distributional Properties as Inductive Bias for Systematic Generalization Felipe del Rio, Alain Raymond-Saez, Daniel Florea, Rodrigo Toro Icarte, Julio Hurtado, Cristian B. Calderon, Alvaro Soto
PDF
Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion Yuxi Mi, Zhizhou Zhong, Yuge Huang, Qiuyang Yuan, Xuan Zhao, Jianqing Xu, Shouhong Ding, Shaoming Wang, Rizen Guo, Shuigeng Zhou
PDF
Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales Shuokai Pan, Gerti Tuzi, Sudarshan Sreeram, Dibakar Gope
PDF
Data-Free Universal Adversarial Perturbation with Pseudo-Semantic Prior Chanhui Lee, Yeonghwan Song, Jeany Son
PDF
Dataset Distillation with Neural Characteristic Function: A Minmax Perspective Shaobo Wang, Yicun Yang, Zhiyuan Liu, Chenghao Sun, Xuming Hu, Conghui He, Linfeng Zhang
PDF
DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion Jinyuan Liu, Bowei Zhang, Qingyun Mei, Xingyuan Li, Yang Zou, Zhiying Jiang, Long Ma, Risheng Liu, Xin Fan
PDF
De^2Gaze: Deformable and Decoupled Representation Learning for 3D Gaze Estimation Yunfeng Xiao, Xiaowei Bai, Baojun Chen, Hao Su, Hao He, Liang Xie, Erwei Yin
PDF
DEAL: Data-Efficient Adversarial Learning for High-Quality Infrared Imaging Zhu Liu, Zijun Wang, Jinyuan Liu, Fanqi Meng, Long Ma, Risheng Liu
PDF
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization Zefeng Zhang, Hengzhu Tang, Jiawei Sheng, Zhenyu Zhang, Yiming Ren, Zhenyang Li, Dawei Yin, Duohe Ma, Tingwen Liu
PDF
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos Zijia Lu, A S M Iftekhar, Gaurav Mittal, Tianjian Meng, Xiawei Wang, Cheng Zhao, Rohith Kukkala, Ehsan Elhamifar, Mei Chen
PDF
Decentralized Diffusion Models David McAllister, Matthew Tancik, Jiaming Song, Angjoo Kanazawa
PDF
Decision SpikeFormer: Spike-Driven Transformer for Decision Making Wei Huang, Qinying Gu, Nanyang Ye
PDF
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception Junjie Wang, Bin Chen, Yulin Li, Bin Kang, Yichi Chen, Zhuotao Tian
PDF
DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image Hyeongjin Nam, Donghwan Kim, Jeongtaek Oh, Kyoung Mu Lee
PDF
Decoder Gradient Shield: Provable and High-Fidelity Prevention of Gradient-Based Box-Free Watermark Removal Haonan An, Guang Hua, Zhengru Fang, Guowen Xu, Susanto Rahardja, Yuguang Fang
PDF
Decompositional Neural Scene Reconstruction with Generative Diffusion Prior Junfeng Ni, Yu Liu, Ruijie Lu, Zirui Zhou, Song-Chun Zhu, Yixin Chen, Siyuan Huang
PDF
Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-Low Bitrate Perception Image Compression Jinchang Xu, Shaokang Wang, Jintao Chen, Zhe Li, Peidong Jia, Fei Zhao, Guoqing Xiang, Zhijian Hao, Shanghang Zhang, Xiaodong Xie
PDF
Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning Qianli Ma, Xuefei Ning, Dongrui Liu, Li Niu, Linfeng Zhang
PDF
Decoupled Distillation to Erase: A General Unlearning Method for Any Class-Centric Tasks Yu Zhou, Dian Zheng, Qijie Mo, Renjie Lu, Kun-Yu Lin, Wei-Shi Zheng
PDF
Decoupled Motion Expression Video Segmentation Hao Fang, Runmin Cong, Xiankai Lu, Xiaofei Zhou, Sam Kwong, Wei Zhang
PDF
DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction Miaowei Wang, Yibo Zhang, Weiwei Xu, Rui Ma, Changqing Zou, Daniel Morris
PDF
Decoupling Fine Detail and Global Geometry for Compressed Depth mAP Super-Resolution Huan Zheng, Wencheng Han, Jianbing Shen
PDF
Decoupling Training-Free Guided Diffusion by ADMM Youyuan Zhang, Zehua Liu, Zenan Li, Zhaoyu Li, James J. Clark, Xujie Si
PDF
DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders Sizai Hou, Songze Li, Duanyi Yao
PDF
Deep Change Monitoring: A Hyperbolic Representative Learning Framework and a Dataset for Long-Term Fine-Grained Tree Change Detection Yante Li, Hanwen Qi, Haoyu Chen, Xinlian Liang, Guoying Zhao
PDF
Deep Fair Multi-View Clustering with Attention KAN HaiMing Xu, Qianqian Wang, Boyue Wang, Quanxue Gao
PDF
DeepCompress-ViT: Rethinking Model Compression to Enhance Efficiency of Vision Transformers at the Edge Sabbir Ahmed, Abdullah Al Arafat, Deniz Najafi, Akhlak Mahmood, Mamshad Nayeem Rizve, Mohaiminul Al Nahian, Ranyang Zhou, Shaahin Angizi, Adnan Siraj Rakin
PDF
DeepLA-Net: Very Deep Local Aggregation Networks for Point Cloud Analysis Ziyin Zeng, Mingyue Dong, Jian Zhou, Huan Qiu, Zhen Dong, Man Luo, Bijun Li
PDF
DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection Jaewoo Song, Daemin Park, Kanghyun Baek, Sangyub Lee, Jooyoung Choi, Eunji Kim, Sungroh Yoon
PDF
DefMamba: Deformable Visual State Space Model Leiye Liu, Miao Zhang, Jihao Yin, Tingwei Liu, Wei Ji, Yongri Piao, Huchuan Lu
PDF
DEFOM-Stereo: Depth Foundation Model Based Stereo Matching Hualie Jiang, Zhiqiang Lou, Laiyan Ding, Rui Xu, Minglang Tan, Wenjie Jiang, Rui Huang
PDF
Deformable Radial Kernel Splatting Yi-Hua Huang, Ming-Xian Lin, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi
PDF
DeformCL: Learning Deformable Centerline Representation for Vessel Extraction in 3D Medical Image Ziwei Zhao, Zhixing Zhang, Yuhang Liu, Zhao Zhang, Haojun Yu, Dong Wang, Liwei Wang
PDF
Degradation-Aware Feature Perturbation for All-in-One Image Restoration Xiangpeng Tian, Xiangyu Liao, Xiao Liu, Meng Li, Chao Ren
PDF
DEIM: DETR with Improved Matching for Fast Convergence Shihua Huang, Zhichao Lu, Xiaodong Cun, Yongjun Yu, Xiao Zhou, Xi Shen
PDF
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification Darryl Ho, Samuel Madden
PDF
DELT: A Simple Diversity-Driven EarlyLate Training for Dataset Distillation Zhiqiang Shen, Ammar Sherif, Zeyuan Yin, Shitong Shao
PDF
Denoising Functional Maps: Diffusion Models for Shape Correspondence Aleksei Zhuravlev, Zorah Lähner, Vladislav Golyanik
PDF
Dense Dispersed Structured Light for Hyperspectral 3D Imaging of Dynamic Scenes Suhyun Shin, Seungwoo Yoon, Ryota Maeda, Seung-Hwan Baek
PDF
Dense Match Summarization for Faster Two-View Estimation Jonathan Astermark, Anders Heyden, Viktor Larsson
PDF
Dense-SfM: Structure from Motion with Dense Consistent Matching JongMin Lee, Sungjoo Yoo
PDF
DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation Chun-Hung Wu, Shih-Hong Chen, Chih-Yao Hu, Hsin-Yu Wu, Kai-Hsin Chen, Yu-You Chen, Chih-Hai Su, Chih-Kuo Lee, Yu-Lun Liu
PDF
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera Yuliang Guo, Sparsh Garg, S. Mahdi H. Miangoleh, Xinyu Huang, Liu Ren
PDF
Depth-Guided Bundle Sampling for Efficient Generalizable Neural Radiance Field Reconstruction Li Fang, Hao Zhu, Longlong Chen, Fei Hu, Long Ye, Zhan Ma
PDF
DepthCrafter: Generating Consistent Long Depth Sequences for Open-World Videos Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xiaodong Cun, Yong Zhang, Long Quan, Ying Shan
PDF
DepthCues: Evaluating Monocular Depth Perception in Large Vision Models Duolikun Danier, Mehmet Aygün, Changjian Li, Hakan Bilen, Oisin Mac Aodha
PDF
DepthSplat: Connecting Gaussian Splatting and Depth Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, Marc Pollefeys
PDF
Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI Won Jun Kim, Hyungjin Chung, Jaemin Kim, Sangmin Lee, Byeongsu Sim, Jong Chul Ye
PDF
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models Yongqi Huang, Peng Ye, Chenyu Huang, Jianjian Cao, Lin Zhang, Baopu Li, Gang Yu, Tao Chen
PDF
Descriptor-in-Pixel : Point-Feature Tracking for Pixel Processor Arrays Laurie Bose, Jianing Chen, Piotr Dudek
PDF
Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis Feng Zhou, Ruiyang Liu, Chen Liu, Gaofeng He, Yong-Lu Li, Xiaogang Jin, Huamin Wang
PDF
DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models Zhendong Wang, Jianmin Bao, Shuyang Gu, Dong Chen, Wengang Zhou, Houqiang Li
PDF
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes Chensheng Peng, Chengwei Zhang, Yixiao Wang, Chenfeng Xu, Yichen Xie, Wenzhao Zheng, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan
PDF
DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering Yihao Wang, Marcus Klasson, Matias Turkulainen, Shuzhe Wang, Juho Kannala, Arno Solin
PDF
Detail-Preserving Latent Diffusion for Stable Shadow Removal Jiamin Xu, Yuxin Zheng, Zelong Li, Chi Wang, Renshu Gu, Weiwei Xu, Gang Xu
PDF
Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data Engine Zhaohu Xing, Lihao Liu, Yijun Yang, Hongqiu Wang, Tian Ye, Sixiang Chen, Wenxue Li, Guang Liu, Lei Zhu
PDF
Detect-and-Guide: Self-Regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization Feifei Li, Mi Zhang, Yiming Sun, Min Yang
PDF
Detecting Adversarial Data Using Perturbation Forgery Qian Wang, Chen Li, Yuchen Luo, Hefei Ling, Shijuan Huang, Ruoxi Jia, Ning Yu
PDF
Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection Jiahao Xu, Zikai Zhang, Rui Hu
PDF
Detecting Open World Objects via Partial Attribute Assignment Muli Yang, Gabriel James Goenawan, Huaiyuan Qin, Kai Han, Xi Peng, Yanhua Yang, Hongyuan Zhu
PDF
Detecting Out-of-Distribution Through the Lens of Neural Collapse Litian Liu, Yao Qin
PDF
Detection-Friendly Nonuniformity Correction: A Union Framework for Infrared UAV Target Detection Houzhang Fang, Xiaolin Wang, Zengyang Li, Lu Wang, Qingshan Li, Yi Chang, Luxin Yan
PDF
Deterministic Certification of Graph Neural Networks Against Graph Poisoning Attacks with Arbitrary Perturbations Jiate Li, Meng Pang, Yun Dong, Binghui Wang
PDF
Deterministic Image-to-Image Translation via Denoising Brownian Bridge Models with Dual Approximators Bohan Xiao, Peiyong Wang, Qisheng He, Ming Dong
PDF
Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis Yu Hua, Weiming Liu, Gui Xu, Yaqing Hou, Yew-Soon Ong, Qiang Zhang
PDF
Devil Is in the Detail: Towards Injecting Fine Details of Image Prompt in Image Generation via Conflict-Free Guidance and Stratified Attention Kyungmin Jo, Jooyeol Yun, Jaegul Choo
PDF
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens Zhangqi Jiang, Junkai Chen, Beier Zhu, Tingjin Luo, Yankun Shen, Xu Yang
PDF
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness Yiming Zhong, Qi Jiang, Jingyi Yu, Yuexin Ma
PDF
DexHandDiff: Interaction-Aware Diffusion Planning for Adaptive Dexterous Manipulation Zhixuan Liang, Yao Mu, Yixiao Wang, Tianxing Chen, Wenqi Shao, Wei Zhan, Masayoshi Tomizuka, Ping Luo, Mingyu Ding
PDF
dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis Luyuan Xie, Tianyu Luan, Wenyuan Cai, Guochen Yan, Zhaoyu Chen, Nan Xi, Yuejian Fang, Qingni Shen, Zhonghai Wu, Junsong Yuan
PDF
DFM: Differentiable Feature Matching for Anomaly Detection Sheng Wu, Yimi Wang, Xudong Liu, Yuguang Yang, Runqi Wang, Guodong Guo, David Doermann, Baochang Zhang
PDF
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation Bo-Wen Yin, Jiao-Long Cao, Ming-Ming Cheng, Qibin Hou
PDF
DH-Set: Improving Vision-Language Alignment with Diverse and Hybrid Set-Embeddings Learning Kun Zhang, Jingyu Li, Zhe Li, S.Kevin Zhou
PDF
DI-PCG: Diffusion-Based Efficient Inverse Procedural Content Generation for High-Quality 3D Asset Creation Wang Zhao, Yan-Pei Cao, Jiale Xu, Yuejiang Dong, Ying Shan
PDF
DiC: Rethinking Conv3x3 Designs in Diffusion Models Yuchuan Tian, Jing Han, Chengcheng Wang, Yuchen Liang, Chao Xu, Hanting Chen
PDF
DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting Seungjun Lee, Gim Hee Lee
PDF
Diff-PaLM: Realistic Palmprint Generation with Polynomial Creases and Intra-Class Variation Controllable Diffusion Models Jianlong Jin, Chenglong Zhao, Ruixin Zhang, Sheng Shang, Jianqing Xu, Jingyun Zhang, ShaoMing Wang, Yang Zhao, Shouhong Ding, Wei Jia, Yunsheng Wu
PDF
Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment Johannes Schusterbauer, Ming Gui, Frank Fundel, Björn Ommer
PDF
DiffCAM: Data-Driven Saliency Maps by Capturing Feature Differences Xingjian Li, Qiming Zhao, Neelesh Bisht, Mostofa Rafid Uddin, Jin Yu Kim, Bryan Zhang, Min Xu
PDF
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID Xin Liang, Yogesh S Rawat
PDF
Difference Inversion: Interpolate and Isolate the Difference with Token Consistency for Image Analogy Generation Hyunsoo Kim, Donghyun Kim, Suhyun Kim
PDF
Differentiable Inverse Rendering with Interpretable Basis BRDFs Hoon-Gyu Chung, Seokjun Choi, Seung-Hwan Baek
PDF
DiffFNO: Diffusion Fourier Neural Operator Xiaoyi Liu, Hao Tang
PDF
DiffLO: Semantic-Aware LiDAR Odometry with Diffusion-Based Refinement Yongshu Huang, Chen Liu, Minghang Zhu, Sheng Ao, Chenglu Wen, Cheng Wang
PDF
DiffLocks: Generating 3D Hair from a Single Image Using Diffusion Models Radu Alexandru Rosu, Keyu Wu, Yao Feng, Youyi Zheng, Michael J. Black
PDF
DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis Yuming Gu, Phong Tran, Yujian Zheng, Hongyi Xu, Heyuan Li, Adilbek Karmanov, Hao Li
PDF
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong
PDF
Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning Jeong Ryong Lee, Yejee Shin, Geonhui Son, Dosik Hwang
PDF
Diffusion Model Is Effectively Its Own Teacher Xinyin Ma, Runpeng Yu, Songhua Liu, Gongfan Fang, Xinchao Wang
PDF
Diffusion Renderer: Neural Inverse and Forward Rendering with Video Diffusion Models Ruofan Liang, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Chih-Hao Lin, Jun Gao, Alexander Keller, Nandita Vijaykumar, Sanja Fidler, Zian Wang
PDF
Diffusion Self-Distillation for Zero-Shot Customized Image Generation Shengqu Cai, Eric Ryan Chan, Yunzhi Zhang, Leonidas Guibas, Jiajun Wu, Gordon Wetzstein
PDF
Diffusion-4k: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models Jinjin Zhang, Qiuyu Huang, Junjie Liu, Xiefan Guo, Di Huang
PDF
Diffusion-Based Event Generation for High-Quality Image Deblurring Xinan Xie, Qing Zhang, Wei-Shi Zheng
PDF
Diffusion-Based Realistic Listening Head Generation via Hybrid Motion Modeling Yinuo Wang, Yanbo Fan, Xuan Wang, Guo Yu, Fei Wang
PDF
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, Xinggang Wang
PDF
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion Qitao Zhao, Amy Lin, Jeff Tan, Jason Y. Zhang, Deva Ramanan, Shubham Tulsiani
PDF
DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation Mu Chen, Liulei Li, Wenguan Wang, Yi Yang
PDF
DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution Xingyuan Li, Zirui Wang, Yang Zou, Zhixin Chen, Jun Ma, Zhiying Jiang, Long Ma, Jinyuan Liu
PDF
DIFIX3D+: Improving 3D Reconstructions with Single-Step Diffusion Models Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcic, Huan Ling
PDF
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention Lianghui Zhu, Zilong Huang, Bencheng Liao, Jun Hao Liew, Hanshu Yan, Jiashi Feng, Xinggang Wang
PDF
DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer Ho-Joong Kim, Yearang Lee, Jung-Ho Hong, Seong-Whan Lee
PDF
Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset Zhao Dong, Ka Chen, Zhaoyang Lv, Hong-Xing Yu, Yunzhi Zhang, Cheng Zhang, Yufeng Zhu, Stephen Tian, Zhengqin Li, Geordie Moffatt, Sean Christofferson, James Fort, Xiaqing Pan, Mingfei Yan, Jiajun Wu, Carl Yuheng Ren, Richard Newcombe
PDF
DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels Erjian Guo, Zhen Zhao, Zicheng Wang, Tong Chen, Yunyi Liu, Luping Zhou
PDF
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection Jia Guo, Shuai Lu, Weihang Zhang, Fang Chen, Huiqi Li, Hongen Liao
PDF
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment Cijo Jose, Théo Moutakanni, Dahyun Kang, Federico Baldassarre, Timothée Darcet, Hu Xu, Daniel Li, Marc Szafraniec, Michaël Ramamonjisoa, Maxime Oquab, Oriane Siméoni, Huy V. Vo, Patrick Labatut, Piotr Bojanowski
PDF
DIO: Decomposable Implicit 4D Occupancy-Flow World Model Christopher Diehl, Quinlan Sykora, Ben Agro, Thomas Gilles, Sergio Casas, Raquel Urtasun
PDF
Directional Label Diffusion Model for Learning from Noisy Labels Senyu Hou, Gaoxia Jiang, Jia Zhang, Shangrong Yang, Husheng Guo, Yaqing Guo, Wenjian Wang
PDF
DirectTriGS: Triplane-Based Gaussian Splatting Field Representation for 3D Generation Xiaoliang Ju, Hongsheng Li
PDF
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery Utkarsh Mall, Cheng Perng Phoo, Mia Chiquier, Bharath Hariharan, Kavita Bala, Carl Vondrick
PDF
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image Hui En Pang, Shuai Liu, Zhongang Cai, Lei Yang, Tianwei Zhang, Ziwei Liu
PDF
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models Yan Xie, Zequn Zeng, Hao Zhang, Yucheng Ding, Yi Wang, Zhengjue Wang, Bo Chen, Hongwei Liu
PDF
Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning Xueyi Ke, Satoshi Tsutsui, Yayun Zhang, Bihan Wen
PDF
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval Leqi Shen, Guoqiang Gong, Tianxiang Hao, Tao He, Yifeng Zhang, Pengzhang Liu, Sicheng Zhao, Jungong Han, Guiguang Ding
PDF
Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations Shengeng Tang, Jiayi He, Lechao Cheng, Jingjing Wu, Dan Guo, Richang Hong
PDF
Disentangled Pose and Appearance Guidance for Multi-Pose Generation Tengfei Xiao, Yue Wu, Yuelong Li, Can Qin, Maoguo Gong, Qiguang Miao, Wenping Ma
PDF
Disentangling Safe and Unsafe Image Corruptions via Anisotropy and Locality Ramchandran Muthukumar, Ambar Pal, Jeremias Sulam, Rene Vidal
PDF
DiskVPS: Vanishing Point Detector via Hough Transform in a Disk Region Jianping Wu
PDF
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction Rui Qian, Shuangrui Ding, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang
PDF
DiSRT-in-Bed: Diffusion-Based Sim-to-Real Transfer Framework for In-Bed Human Mesh Recovery Jing Gao, Ce Zheng, Laszlo A. Jeni, Zackory Erickson
PDF
Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability Yingdong Shi, Changming Li, Yifan Wang, Yongxiang Zhao, Anqi Pang, Sibei Yang, Jingyi Yu, Kan Ren
PDF
Distilled Prompt Learning for Incomplete Multimodal Survival Prediction Yingxue Xu, Fengtao Zhou, Chenyu Zhao, Yihui Wang, Can Yang, Hao Chen
PDF
Distilling Long-Tailed Datasets Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang, Kai Wang, Yan Yan
PDF
Distilling Monocular Foundation Model for Fine-Grained Depth Completion Yingping Liang, Yutao Hu, Wenqi Shao, Ying Fu
PDF
Distilling Multi-Modal Large Language Models for Autonomous Driving Deepti Hegde, Rajeev Yasarla, Hong Cai, Shizhong Han, Apratim Bhattacharyya, Shweta Mahajan, Litian Liu, Risheek Garrepalli, Vishal M. Patel, Fatih Porikli
PDF
Distilling Spatially-Heterogeneous Distortion Perception for Blind Image Quality Assessment Xudong Li, Wenjie Nie, Yan Zhang, Runze Hu, Ke Li, Xiawu Zheng, Liujuan Cao
PDF
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation Chanyoung Kim, Dayun Ju, Woojung Han, Ming-Hsuan Yang, Seong Jae Hwang
PDF
DistinctAD: Distinctive Audio Description Generation in Contexts Bo Fang, Wenhao Wu, Qiangqiang Wu, Yuxin Song, Antoni B. Chan
PDF
Distinguish Then Exploit: Source-Free Open Set Domain Adaptation via Weight Barcode Estimation and Sparse Label Assignment Weiming Liu, Jun Dan, Fan Wang, Xinting Liao, Junhao Dong, Hua Yu, Shunjie Dong, Lianyong Qi
PDF
Distraction Is All You Need for Multimodal Large Language Model Jailbreaking Zuopeng Yang, Jiluan Fan, Anli Yan, Erdun Gao, Xin Lin, Tao Li, Kanghua Mo, Changyu Dong
PDF
Distribution Prototype Diffusion Learning for Open-Set Supervised Anomaly Detection Fuyun Wang, Tong Zhang, Yuanzhi Wang, Yide Qiu, Xin Liu, Xu Guo, Zhen Cui
PDF
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations Krishna Sri Ipsit Mantri, Carola-Bibiane Schönlieb, Bruno Ribeiro, Chaim Baskin, Moshe Eliasof
PDF
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation Minghong Cai, Xiaodong Cun, Xiaoyu Li, Wenze Liu, Zhaoyang Zhang, Yong Zhang, Ying Shan, Xiangyu Yue
PDF
DIV-FF: Dynamic Image-Video Feature Fields for Environment Understanding in Egocentric Videos Lorenzo Mur-Labadia, Josechu Guerrero, Ruben Martinez-Cantin
PDF
DiverseFlow: Sample-Efficient Diverse Mode Coverage in Flows Mashrur M. Morshed, Vishnu Boddeti
PDF
Divide and Conquer: Heterogeneous Noise Integration for Diffusion-Based Adversarial Purification Gaozheng Pei, Shaojie Lyu, Gong Chen, Ke Ma, Qianqian Xu, Yingfei Sun, Qingming Huang
PDF
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Yuying Ge, Yizhuo Li, Yixiao Ge, Ying Shan
PDF
DivPrune: Diversity-Based Visual Token Pruning for Large Multimodal Models Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari, Yong Zhang
PDF
DKC: Differentiated Knowledge Consolidation for Cloth-Hybrid Lifelong Person Re-Identification Zhenyu Cui, Jiahuan Zhou, Yuxin Peng
PDF
DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture Qianlong Xiang, Miao Zhang, Yuzhang Shang, Jianlong Wu, Yan Yan, Liqiang Nie
PDF
DL2G: Degradation-Guided Local-to-Global Restoration for Eyeglass Reflection Removal Zhilv Yi, Xiao Lu, Hong Ding, Jingbo Hu, Zhi Jiang, Chunxia Xiao
PDF
DNF: Unconditional 4D Generation with Dictionary-Based Neural Fields Xinyi Zhang, Naiqi Li, Angela Dai
PDF
DnLUT: Ultra-Efficient Color Image Denoising via Channel-Aware Lookup Tables Sidi Yang, Binxiao Huang, Yulun Zhang, Dahai Yu, Yujiu Yang, Ngai Wong
PDF
Do Computer Vision Foundation Models Learn the Low-Level Characteristics of the Human Visual System? Yancheng Cai, Fei Yin, Dounia Hammou, Rafal Mantiuk
PDF
Do ImageNet-Trained Models Learn Shortcuts? the Impact of Frequency Shortcuts on Generalization Shunxin Wang, Raymond Veldhuis, Nicola Strisciuglio
PDF
Do Visual Imaginations Improve Vision-and-Language Navigation Agents? Akhil Perincherry, Jacob Krantz, Stefan Lee
PDF
Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild Damien Teney, Liangze Jiang, Florin Gogianu, Ehsan Abbasnejad
PDF
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-Modal Large Language Models? Yanbo Wang, Jiyang Guan, Jian Liang, Ran He
PDF
Do Your Best and Get Enough REST for Continual Learning Hankyul Kang, Gregor Seifer, Donghyun Lee, Jongbin Ryu
PDF
DocLayLLM: An Efficient Multi-Modal Extension of Large Language Models for Text-Rich Document Understanding Wenhui Liao, Jiapeng Wang, Hongliang Li, Chengyu Wang, Jun Huang, Lianwen Jin
PDF
Docopilot: Improving Multimodal Models for Document-Level Understanding Yuchen Duan, Zhe Chen, Yusong Hu, Weiyun Wang, Shenglong Ye, Botian Shi, Lewei Lu, Qibin Hou, Tong Lu, Hongsheng Li, Jifeng Dai, Wenhai Wang
PDF
DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning Xiao-Hui Li, Fei Yin, Cheng-Lin Liu
PDF
Document Haystacks: Vision-Language Reasoning over Piles of 1000+ Documents Jun Chen, Dannong Xu, Junjie Fei, Chun-Mei Feng, Mohamed Elhoseiny
PDF
DocVLM: Make Your VLM an Efficient Reader Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz, Elad Ben Avraham, Alona Golts, Yair Kittenplon, Shai Mazor, Ron Litman
PDF
DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting Liao Shen, Tianqi Liu, Huiqiang Sun, Jiaqi Li, Zhiguo Cao, Wei Li, Chen Change Loy
PDF
DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Post-Capture Refocusing, Defocus Rendering and Blur Removal Yujie Wang, Praneeth Chakravarthula, Baoquan Chen
PDF
Domain Adaptive Diabetic Retinopathy Grading with Model Absence and Flowing Data Wenxin Su, Song Tang, Xiaofeng Liu, Xiaojing Yi, Mao Ye, Chunxiao Zu, Jiahao Li, Xiatian Zhu
PDF
Domain Generalization in CLIP via Learning with Diverse Text Prompts Changsong Wen, Zelin Peng, Yu Huang, Xiaokang Yang, Wei Shen
PDF
Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving Ziying Song, Caiyan Jia, Lin Liu, Hongyu Pan, Yongchang Zhang, Junming Wang, Xingyu Zhang, Shaoqing Xu, Lei Yang, Yadan Luo
PDF
Doppelgangers and Adversarial Vulnerability George Kamberov
PDF
Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features Yuanbo Xiangli, Ruojin Cai, Hanyu Chen, Jeffrey Byrne, Noah Snavely
PDF
DoRA: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders Rui Chen, Jianfeng Zhang, Yixun Liang, Guan Luo, Weiyu Li, Jiarui Liu, Xiu Li, Xiaoxiao Long, Jiashi Feng, Ping Tan
PDF
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles Rui Zhao, Weijia Mao, Mike Zheng Shou
PDF
DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution Zhengxue Wang, Zhiqiang Yan, Jinshan Pan, Guangwei Gao, Kai Zhang, Jian Yang
PDF
DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models Haoyang Li, Liang Wang, Chao Wang, Jing Jiang, Yan Peng, Guodong Long
PDF
DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework Henrique Morimitsu, Xiaobin Zhu, Roberto M. Cesar, Xiangyang Ji, Xu-Cheng Yin
PDF
DPSeg: Dual-Prompt Cost Volume Learning for Open-Vocabulary Semantic Segmentation Ziyu Zhao, Xiaoguang Li, Lingjia Shi, Nasrin Imanpour, Song Wang
PDF
DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection Shawn Li, Huixian Gong, Hao Dong, Tiankai Yang, Zhengzhong Tu, Yue Zhao
PDF
Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration Kim Jun-Seong, GeonU Kim, Kim Yu-Ji, Yu-Chiang Frank Wang, Jaesung Choe, Tae-Hyun Oh
PDF
Dragin3D: Image Editing by Dragging in 3D Space Weiran Guang, Xiaoguang Gu, Mengqi Huang, Zhendong Mao
PDF
DRAWER: Digital Reconstruction and Articulation with Environment Realism Hongchi Xia, Entong Su, Marius Memmel, Arhan Jain, Raymond Yu, Numfor Mbiziwo-Tiapo, Ali Farhadi, Abhishek Gupta, Shenlong Wang, Wei-Chiu Ma
PDF
DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching Emanuele Aiello, Umberto Michieli, Diego Valsesia, Mete Ozay, Enrico Magli
PDF
DreamOmni: Unified Image Generation and Editing Bin Xia, Yuechen Zhang, Jingyao Li, Chengyao Wang, Yitong Wang, Xinglong Wu, Bei Yu, Jiaya Jia
PDF
DreamRelation: Bridging Customization and Relation Generation Qingyu Shi, Lu Qi, Jianzong Wu, Jinbin Bai, Jingbo Wang, Yunhai Tong, Xiangtai Li
PDF
DreamText: High Fidelity Scene Text Synthesis Yibin Wang, Weizhong Zhang, Honghui Xu, Cheng Jin
PDF
DreamTrack: Dreaming the Future for Multimodal Visual Object Tracking Mingzhe Guo, Weiping Tan, Wenyu Ran, Liping Jing, Zhipeng Zhang
PDF
DRiVE: Diffusion-Based Rigging Empowers Generation of Versatile and Expressive Characters Mingze Sun, Junhao Chen, Junting Dong, Yurun Chen, Xinyu Jiang, Shiwei Mao, Puhua Jiang, Jingbo Wang, Bo Dai, Ruqi Huang
PDF
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation Guosheng Zhao, Chaojun Ni, Xiaofeng Wang, Zheng Zhu, Xueyang Zhang, Yida Wang, Guan Huang, Xinze Chen, Boyuan Wang, Youyi Zhang, Wenjun Mei, Xingang Wang
PDF
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation Hongbin Lin, Zilu Guo, Yifan Zhang, Shuaicheng Niu, Yafeng Li, Ruimao Zhang, Shuguang Cui, Zhen Li
PDF
DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving Zhenhua Xu, Yan Bai, Yujia Zhang, Zhuoling Li, Fei Xia, Kwan-Yee K. Wong, Jianqiang Wang, Hengshuang Zhao
PDF
DriveScape: High-Resolution Driving Video Generation by Multi-View Feature Fusion Wei Wu, Xi Guo, Weixuan Tang, Tingxuan Huang, Chiyu Wang, Chenjing Ding
PDF
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD mAP Xinyuan Chang, Maixuan Xue, Xinran Liu, Zheng Pan, Xing Wei
PDF
DrivingSphere: Building a High-Fidelity 4D World for Closed-Loop Simulation Tianyi Yan, Dongming Wu, Wencheng Han, Junpeng Jiang, Xia Zhou, Kun Zhan, Cheng-zhong Xu, Jianbing Shen
PDF
DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery Jiadong Tang, Yu Gao, Dianyi Yang, Liqi Yan, Yufeng Yue, Yi Yang
PDF
DropGaussian: Structural Regularization for Sparse-View Gaussian Splatting Hyunwoo Park, Gun Ryu, Wonjun Kim
PDF
DropoutGS: Dropping Out Gaussians for Better Sparse-View Rendering Yexing Xu, Longguang Wang, Minglin Chen, Sheng Ao, Li Li, Yulan Guo
PDF
DrVideo: Document Retrieval Based Long Video Understanding Ziyu Ma, Chenhui Gou, Hengcan Shi, Bin Sun, Shutao Li, Hamid Rezatofighi, Jianfei Cai
PDF
DSPNet: Dual-Vision Scene Perception for Robust 3D Question Answering Jingzhou Luo, Yang Liu, Weixing Chen, Zhen Li, Yaowei Wang, Guanbin Li, Liang Lin
PDF
DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation Amin Karimi, Charalambos Poullis
PDF
DTGBrepGen: A Novel B-Rep Generative Model Through Decoupling Topology and Geometry Jing Li, Yihang Fu, Falai Chen
PDF
DTOS: Dynamic Time Object Sensing with Large Multimodal Model Jirui Tian, Jinrong Zhang, Shenglan Liu, Luhao Xu, Zhixiong Huang, Gao Huang
PDF
Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning Da-Wei Zhou, Zi-Wen Cai, Han-Jia Ye, Lijun Zhang, De-Chuan Zhan
PDF
Dual Diffusion for Unified Image Generation and Understanding Zijie Li, Henry Li, Yichun Shi, Amir Barati Farimani, Yuval Kluger, Linjie Yang, Peng Wang
PDF
Dual Energy-Based Model with Open-World Uncertainty Estimation for Out-of-Distribution Detection Qi Chen, Hu Ding
PDF
Dual Exposure Stereo for Extended Dynamic Range 3D Imaging Juhyung Choi, Jinnyeong Kim, Seokjun Choi, Jinwoo Lee, Samuel Brucker, Mario Bijelic, Felix Heide, Seung-Hwan Baek
PDF
Dual Focus-Attention Transformer for Robust Point Cloud Registration Kexue Fu, Mingzhi Yuan, Changwei Wang, Weiguang Pang, Jing Chi, Manning Wang, Longxiang Gao
PDF
Dual Prompting Image Restoration with Diffusion Transformers Dehong Kong, Fan Li, Zhixin Wang, Jiaqi Xu, Renjing Pei, Wenbo Li, WenQi Ren
PDF
Dual Semantic Guidance for Open Vocabulary Semantic Segmentation Zhengyang Wang, Tingliang Feng, Fan Lyu, Fanhua Shang, Wei Feng, Liang Wan
PDF
Dual-Agent Optimization Framework for Cross-Domain Few-Shot Segmentation Zhaoyang Li, Yuan Wang, Wangkai Li, Tianzhu Zhang, Xiang Liu
PDF
Dual-Granularity Semantic Guided Sparse Routing Diffusion Model for General Pansharpening Yinghui Xing, Litao Qu, Shizhou Zhang, Di Xu, Yingkun Yang, Yanning Zhang
PDF
Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation Ying Jin, Jinlong Peng, Qingdong He, Teng Hu, Jiafu Wu, Hao Chen, Haoxuan Wang, Wenbing Zhu, Mingmin Chi, Jun Liu, Yabiao Wang
PDF
Dual-View X-Ray Detection: Can AI Detect Prohibited Items from Dual-View X-Ray Images like Humans? Renshuai Tao, Haoyu Wang, Yuzhe Guo, Hairong Chen, Li Zhang, Xianglong Liu, Yunchao Wei, Yao Zhao
PDF
DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction Ben Kaye, Tomas Jakab, Shangzhe Wu, Christian Ruprecht, Andrea Vedaldi
PDF
DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations Ziqiao Peng, Yanbo Fan, Haoyu Wu, Xuan Wang, Hongyan Liu, Jun He, Zhaoxin Fan
PDF
DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers Mert Bülent Sarıyıldız, Philippe Weinzaepfel, Thomas Lucas, Pau de Jorge, Diane Larlus, Yannis Kalantidis
PDF
DV-Matcher: Deformation-Based Non-Rigid Point Cloud Matching Guided by Pre-Trained Visual Features Zhangquan Chen, Puhua Jiang, Ruqi Huang
PDF
DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition Caoshuo Li, Tanzhe Li, Xiaobin Hu, Donghao Luo, Taisong Jin
PDF
DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension Xiaofu Chen, Yaxin Luo, Gen Luo, Jiayi Ji, Henghui Ding, Yiyi Zhou
PDF
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models Keda Tao, Can Qin, Haoxuan You, Yang Sui, Huan Wang
PDF
DyCON: Dynamic Uncertainty-Aware Consistency and Contrastive Learning for Semi-Supervised Medical Image Segmentation Maregu Assefa, Muzammal Naseer, Iyyakutti Iyappan Ganapathi, Syed Sadaf Ali, Mohamed L Seghier, Naoufel Werghi
PDF
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding Geng Li, Jinglin Xu, Yunzhen Zhao, Yuxin Peng
PDF
DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling Xin Xie, Dong Gong
PDF
Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera Zhengdi Yu, Stefanos Zafeiriou, Tolga Birdal
PDF
Dynamic Camera Poses and Where to Find Them Chris Rockwell, Joseph Tung, Tsung-Yi Lin, Ming-Yu Liu, David F. Fouhey, Chen-Hsuan Lin
PDF
Dynamic Content Prediction with Motion-Aware Priors for Blind Face Video Restoration Lianxin Xie, Bingbing Zheng, Si Wu, Hau San Wong
PDF
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics Chen Liu, Liying Yang, Peike Li, Dadong Wang, Lincheng Li, Xin Yu
PDF
Dynamic Group Normalization: Spatio-Temporal Adaptation to Evolving Data Statistics Yair Smadar, Assaf Hoogi
PDF
Dynamic Integration of Task-Specific Adapters for Class Incremental Learning Jiashuo Li, Shaokun Wang, Bo Qian, Yuhang He, Xing Wei, Qiang Wang, Yihong Gong
PDF
Dynamic Motion Blending for Versatile Motion Editing Nan Jiang, Hongjie Li, Ziye Yuan, Zimo He, Yixin Chen, Tengyu Liu, Yixin Zhu, Siyuan Huang
PDF
Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis Awais Nizamani, Hamid Laga, Guanjin Wang, Farid Boussaid, Mohammed Bennamoun, Anuj Srivastava
PDF
Dynamic Pseudo Labeling via Gradient Cutting for High-Low Entropy Exploration Jae Hyeon Park, Joo Hyeon Jeon, Jae Yun Lee, Sangyeon Ahn, Min Hee Cha, Min Geol Kim, Hyeok Nam, Sung In Cho
PDF
Dynamic Stereotype Theory Induced Micro-Expression Recognition with Oriented Deformation Bohao Zhang, Xuejiao Wang, Changbo Wang, Gaoqi He
PDF
Dynamic Updates for Language Adaptation in Visual-Language Tracking Xiaohai Li, Bineng Zhong, Qihua Liang, Zhiyi Mo, Jian Nong, Shuxiang Song
PDF
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes Jinxiu Liu, Shaoheng Lin, Yinxiao Li, Ming-Hsuan Yang
PDF
DynaMoDe-NeRF: Motion-Aware Deblurring Neural Radiance Field for Dynamic Scenes Ashish Kumar, A. N. Rajagopalan
PDF
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding Yudong Han, Qingpei Guo, Liyuan Pan, Liu Liu, Yu Guan, Ming Yang
PDF
DynPose: Largely Improving the Efficiency of Human Pose Estimation by a Simple Dynamic Framework Yalong Xu, Lin Zhao, Chen Gong, Guangyu Li, Di Wang, Nannan Wang
PDF
DynRefer: Delving into Region-Level Multimodal Tasks via Dynamic Resolution Yuzhong Zhao, Feng Liu, Yue Liu, Mingxiang Liao, Chen Gong, Qixiang Ye, Fang Wan
PDF
DynScene: Scalable Generation of Dynamic Robotic Manipulation Scenes for Embodied AI Sangmin Lee, Sungyong Park, Heewon Kim
PDF
EAP-GS: Efficient Augmentation of Pointcloud for 3D Gaussian Splatting in Few-Shot Scene Reconstruction Dongrui Dai, Yuxiang Xing
PDF
Early-Bird Diffusion: Investigating and Leveraging Timestep-Aware Early-Bird Tickets in Diffusion Models for Efficient Training Lexington Whalen, Zhenbang Du, Haoran You, Chaojian Li, Sixu Li, Yingyan Lin
PDF
EarthDial: Turning Multi-Sensory Earth Observations to Interactive Dialogues Sagar Soni, Akshay Dudhane, Hiyam Debary, Mustansar Fiaz, Muhammad Akhtar Munir, Muhammad Sohail Danish, Paolo Fraccaro, Campbell D Watson, Levente J Klein, Fahad Shahbaz Khan, Salman Khan
PDF
EASEMVC:Efficient Dual Selection Mechanism for Deep Multi-View Clustering Baili Xiao, Zhibin Dong, Ke Liang, Suyuan Liu, Siwei Wang, Tianrui Liu, Xingchen Hu, En Zhu, Xinwang Liu
PDF
Easy-Editable Image Vectorization with Multi-Layer Multi-Scale Distributed Visual Feature Embedding Ye Chen, Zhangli Hu, Zhongyin Zhao, Yupeng Zhu, Yue Shi, Yuxuan Xiong, Bingbing Ni
PDF
EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting Suzhen Wang, Weijie Chen, Wei Zhang, Minda Zhao, Lincheng Li, Rongsheng Zhang, Zhipeng Hu, Xin Yu
PDF
EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild Yumeng Liu, Xiaoxiao Long, Zemin Yang, Yuan Liu, Marc Habermann, Christian Theobalt, Yuexin Ma, Wenping Wang
PDF
EBS-EKF: Accurate and High Frequency Event-Based Star Tracking Albert W. Reed, Connor Hashemi, Dennis Melamed, Nitesh Menon, Keigo Hirakawa, Scott McCloskey
PDF
ECBench: Can Multi-Modal Foundation Models Understand the Egocentric World? a Holistic Embodied Cognition Benchmark Ronghao Dang, Yuqian Yuan, Wenqi Zhang, Yifei Xin, Boqiang Zhang, Long Li, Liuyi Wang, Qinyang Zeng, Xin Li, Lidong Bing
PDF
EchoMatch: Partial-to-Partial Shape Matching via Correspondence Reflection Yizheng Xie, Viktoria Ehm, Paul Roetzer, Nafie El Amrani, Maolin Gao, Florian Bernard, Daniel Cremers
PDF
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation Rang Meng, Xingyu Zhang, Yuming Li, Chenguang Ma
PDF
EchoONE: Segmenting Multiple Echocardiography Planes in One Model Jiongtong Hu, Wufeng Xue, Jun Cheng, Yingying Liu, Wei Zhuo, Dong Ni
PDF
EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights Zhenghao Xing, Hao Chen, Binzhu Xie, Jiaqi Xu, Ziyu Guo, Xuemiao Xu, Jianye Hao, Chi-Wing Fu, Xiaowei Hu, Pheng-Ann Heng
PDF
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance Yang Yue, Yulin Wang, Haojun Jiang, Pan Liu, Shiji Song, Gao Huang
PDF
ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression Wei Jiang, Junru Li, Kai Zhang, Li Zhang
PDF
EDCFlow: Exploring Temporally Dense Difference Maps for Event-Based Optical Flow Estimation Daikun Liu, Lei Cheng, Teng Wang, Changyin Sun
PDF
EDEN: Enhanced Diffusion for High-Quality Large-Motion Video Frame Interpolation Zihao Zhang, Haoran Chen, Haoyu Zhao, Guansong Lu, Yanwei Fu, Hang Xu, Zuxuan Wu
PDF
Edge-SD-SR: Low Latency and Parameter Efficient On-Device Super-Resolution with Stable Diffusion via Bidirectional Conditioning Isma Hadji, Mehdi Noroozi, Victor Escorcia, Anestis Zaganidis, Brais Martinez, Georgios Tzimiropoulos
PDF
EdgeDiff: Edge-Aware Diffusion Network for Building Reconstruction from Point Clouds Yujun Liu, Ruisheng Wang, Shangfeng Huang, Guorong Cai
PDF
EdgeMovingNet: Edge-Preserving Point Cloud Reconstruction via Joint Geometry Features Xinran Yang, Donghao Ji, Yuanqi Li, Junyuan Xie, Jie Guo, Yanwen Guo
PDF
EdgeTAM: On-Device Track Anything Model Chong Zhou, Chenchen Zhu, Yunyang Xiong, Saksham Suri, Fanyi Xiao, Lemeng Wu, Raghuraman Krishnamoorthi, Bo Dai, Chen Change Loy, Vikas Chandra, Bilge Soran
PDF
Edit Away and My Face Will Not Stay: Personal Biometric Defense Against Malicious Generative Editing Hanhui Wang, Yihua Zhang, Ruizheng Bai, Yue Zhao, Sijia Liu, Zhengzhong Tu
PDF
EditAR: Unified Conditional Generation with Autoregressive Models Jiteng Mu, Nuno Vasconcelos, Xiaolong Wang
PDF
EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting Dong In Lee, Hyeongcheol Park, Jiyoung Seo, Eunbyung Park, Hyunje Park, Ha Dam Baek, Sangheon Shin, Sangmin Kim, Sangpil Kim
PDF
EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching Dongki Jung, Jaehoon Choi, Yonghan Lee, Somi Jeong, Taejae Lee, Dinesh Manocha, Suyong Yeon
PDF
EEE-Bench: A Comprehensive Multimodal Electrical and Electronics Engineering Benchmark Ming Li, Jike Zhong, Tianle Chen, Yuxiang Lai, Konstantinos Psounis
PDF
Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space Yi Liu, Wengen Li, Jihong Guan, Shuigeng Zhou, Yichao Zhang
PDF
Effective SAM Combination for Open-Vocabulary Semantic Segmentation Minhyeok Lee, Suhwan Cho, Jungho Lee, Sunghun Yang, Heeseung Choi, Ig-Jae Kim, Sangyoun Lee
PDF
Efficient ANN-Guided Distillation: Aligning Rate-Based Features of Spiking Neural Networks Through Hybrid Block-Wise Replacement Shu Yang, Chengting Yu, Lei Liu, Hanzhi Ma, Aili Wang, Erping Li
PDF
Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks Uranik Berisha, Jens Mehnert, Alexandru Paul Condurache
PDF
Efficient Decoupled Feature 3D Gaussian Splatting via Hierarchical Compression Zhenqi Dai, Ting Liu, Yanning Zhang
PDF
Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses Yongfan Liu, Hyoukjun Kwon
PDF
Efficient Diffusion as Low Light Enhancer Guanzhou Lan, Qianli Ma, Yuqi Yang, Zhigang Wang, Dong Wang, Xuelong Li, Bin Zhao
PDF
Efficient Dynamic Scene Editing via 4D Gaussian-Based Static-Dynamic Separation Joohyun Kwon, Hanbyel Cho, Junmo Kim
PDF
Efficient Event-Based Object Detection: A Hybrid Neural Network with Spatial and Temporal Attention Soikat Hasan Ahmed, Jan Finkbeiner, Emre Neftci
PDF
Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models Reza Shirkavand, Peiran Yu, Shangqian Gao, Gowthami Somepalli, Tom Goldstein, Heng Huang
PDF
Efficient Long Video Tokenization via Coordinate-Based Patch Reconstruction Huiwon Jang, Sihyun Yu, Jinwoo Shin, Pieter Abbeel, Younggyo Seo
PDF
Efficient Motion-Aware Video MLLM Zijia Zhao, Yuqi Huo, Tongtian Yue, Longteng Guo, Haoyu Lu, Bingning Wang, Weipeng Chen, Jing Liu
PDF
Efficient Personalization of Quantized Diffusion Model Without Backpropagation Hoigi Seo, Wongi Jeong, Kyungryeol Lee, Se Young Chun
PDF
Efficient Test-Time Adaptive Object Detection via Sensitivity-Guided Pruning Kunyu Wang, Xueyang Fu, Xin Lu, Chengjie Ge, Chengzhi Cao, Wei Zhai, Zheng-Jun Zha
PDF
Efficient Transfer Learning for Video-Language Foundation Models Haoxing Chen, Zizheng Huang, Yan Hong, Yanshuo Wang, Zhongcai Lyu, Zhuoer Xu, Jun Lan, Zhangxuan Gu
PDF
Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency Yutong Wang, Jiajie Teng, Jiajiong Cao, Yuming Li, Chenguang Ma, Hongteng Xu, Dixin Luo
PDF
Efficient Video Super-Resolution for Real-Time Rendering with Decoupled G-Buffer Guidance Mingjun Zheng, Long Sun, Jiangxin Dong, Jinshan Pan
PDF
Efficient Visual State Space Model for Image Deblurring Lingshun Kong, Jiangxin Dong, Jinhui Tang, Ming-Hsuan Yang, Jinshan Pan
PDF
EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-Language Models Yinan Liang, Ziwei Wang, Xiuwei Xu, Jie Zhou, Jiwen Lu
PDF
EfficientViM: Efficient Vision Mamba with Hidden State Mixer Based State Space Duality Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim
PDF
EffiDec3D: An Optimized Decoder for High-Performance and Efficient 3D Medical Image Segmentation Md Mostafijur Rahman, Radu Marculescu
PDF
Effortless Active Labeling for Long-Term Test-Time Adaptation Guowei Wang, Changxing Ding
PDF
Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input Jian Wang, Rishabh Dabral, Diogo Luvizon, Zhe Cao, Lingjie Liu, Thabo Beeler, Christian Theobalt
PDF
EgoLife: Towards Egocentric Life Assistant Jingkang Yang, Shuai Liu, Hongming Guo, Yuhao Dong, Xiamengwei Zhang, Sicheng Zhang, Pengyun Wang, Zitang Zhou, Binzhu Xie, Ziyue Wang, Bei Ouyang, Zhengyu Lin, Marco Cominelli, Zhongang Cai, Bo Li, Yuanhan Zhang, Peiyuan Zhang, Fangzhou Hong, Joerg Widmer, Francesco Gringoli, Lei Yang, Ziwei Liu
PDF
EgoLM: Multi-Modal Language Model of Egocentric Motions Fangzhou Hong, Vladimir Guzov, Hyo Jin Kim, Yuting Ye, Richard Newcombe, Ziwei Liu, Lingni Ma
PDF
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision Yiming Zhao, Taein Kwon, Paul Streli, Marc Pollefeys, Christian Holz
PDF
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering Sheng Zhou, Junbin Xiao, Qingyun Li, Yicong Li, Xun Yang, Dan Guo, Meng Wang, Tat-Seng Chua, Angela Yao
PDF
EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation Diljeet Jagpal, Xi Chen, Vinay P. Namboodiri
PDF
EigenGS Representation: From Eigenspace to Gaussian Image Space Lo-Wei Tai, Ching-En Li, Cheng-Lin Chen, Chih-Jung Tsai, Hwann-Tzong Chen, Tyng-Luh Liu
PDF
Electromyography-Informed Facial Expression Reconstruction for Physiological-Based Synthesis and Analysis Tim Büchner, Christoph Anders, Orlando Guntinas-Lichius, Joachim Denzler
PDF
Embodied Scene Understanding for Vision Language Models via MetaVQA Weizhen Wang, Chenda Duan, Zhenghao Peng, Yuxin Liu, Bolei Zhou
PDF
Embracing Collaboration over Competition: Condensing Multiple Prompts for Visual In-Context Learning Jinpeng Wang, Tianci Luo, Yaohua Zha, Yan Feng, Ruisheng Luo, Bin Chen, Tao Dai, Long Chen, Yaowei Wang, Shu-Tao Xia
PDF
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing Gaoxiang Cong, Jiadong Pan, Liang Li, Yuankai Qi, Yuxin Peng, Anton van den Hengel, Jian Yang, Qingming Huang
PDF
EMOE: Modality-Specific Enhanced Dynamic Emotion Experts Yiyang Fang, Wenke Huang, Guancheng Wan, Kehua Su, Mang Ye
PDF
EmoEdit: Evoking Emotions Through Image Manipulation Jingyuan Yang, Jiawei Feng, Weibin Luo, Dani Lischinski, Daniel Cohen-Or, Hui Huang
PDF
EmotiveTalk: Expressive Talking Head Generation Through Audio Information Decoupling and Emotional Video Diffusion Haotian Wang, Yuzhe Weng, Yueyan Li, Zilu Guo, Jun Du, Shutong Niu, Jiefeng Ma, Shan He, Xiaoyan Wu, Qiming Hu, Bing Yin, Cong Liu, Qingfeng Liu
PDF
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Kai Chen, Yunhao Gou, Runhui Huang, Zhili Liu, Daxin Tan, Jing Xu, Chunwei Wang, Yi Zhu, Yihan Zeng, Kuo Yang, Dingdong Wang, Kun Xiang, Haoyuan Li, Haoli Bai, Jianhua Han, Xiaohui Li, Weike Jin, Nian Xie, Yu Zhang, James T. Kwok, Hengshuang Zhao, Xiaodan Liang, Dit-Yan Yeung, Xiao Chen, Zhenguo Li, Wei Zhang, Qun Liu, Lanqing Hong, Lu Hou, Hang Xu
PDF
Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios Kai Wang, Zekai Li, Zhi-Qi Cheng, Samir Khaki, Ahmad Sajedi, Ramakrishna Vedantam, Konstantinos N Plataniotis, Alexander Hauptmann, Yang You
PDF
Empowering Large Language Models with 3D Situation Awareness Zhihao Yuan, Yibo Peng, Jinke Ren, Yinghong Liao, Yatong Han, Chun-Mei Feng, Hengshuang Zhao, Guanbin Li, Shuguang Cui, Zhen Li
PDF
Empowering LLMs to Understand and Generate Complex Vector Graphics Ximing Xing, Juncheng Hu, Guotao Liang, Jing Zhang, Dong Xu, Qian Yu
PDF
Empowering Vector Graphics with Consistently Arbitrary Viewing and View-Dependent Visibility Yidi Li, Jun Xiao, Zhengda Lu, Yiqun Wang, Haiyong Jiang
PDF
Encapsulated Composition of Text-to-Image and Text-to-Video Models for High-Quality Video Synthesis Tongtong Su, Chengyu Wang, Bingyan Liu, Jun Huang, Dongming Lu
PDF
End-to-End HOI Reconstruction Transformer with Graph-Based Encoding Zhenrong Wang, Qi Zheng, Sihan Ma, Maosheng Ye, Yibing Zhan, Dongjiang Li
PDF
End-to-End Implicit Neural Representations for Classification Alexander Gielisse, Jan van Gemert
PDF
Enduring, Efficient and Robust Trajectory Prediction Attack in Autonomous Driving via Optimization-Driven Multi-Frame Perturbation Framework Yi Yu, Weizhen Han, Libing Wu, Bingyi Liu, Enshu Wang, Zhuangzhuang Zhang
PDF
EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space Jianrong Zhang, Hehe Fan, Yi Yang
PDF
Enhanced Contrastive Learning with Multi-View Longitudinal Data for Chest X-Ray Report Generation Kang Liu, Zhuoqi Ma, Xiaolu Kang, Yunan Li, Kun Xie, Zhicheng Jiao, Qiguang Miao
PDF
Enhanced OoD Detection Through Cross-Modal Alignment of Multi-Modal Representations Jeonghyeon Kim, Sangheum Hwang
PDF
Enhanced Then Progressive Fusion with View Graph for Multi-View Clustering Zhibin Dong, Meng Liu, Siwei Wang, Ke Liang, Yi Zhang, Suyuan Liu, Jiaqi Jin, Xinwang Liu, En Zhu
PDF
Enhanced Visual-Semantic Interaction with Tailored Prompts for Pedestrian Attribute Recognition Junyi Wu, Yan Huang, Min Gao, Yuzhen Niu, Yuzhong Chen, Qiang Wu
PDF
Enhancing 3D Gaze Estimation in the Wild Using Weak Supervision with Gaze Following Labels Pierre Vuillecard, Jean-Marc Odobez
PDF
Enhancing Adversarial Transferability with Checkpoints of a Single Model's Training Shixin Li, Chaoxiang He, Xiaojing Ma, Bin Benjamin Zhu, Shuo Wang, Hongsheng Hu, Dongmei Zhang, Linchen Yu
PDF
Enhancing Creative Generation on Stable Diffusion-Based Models Jiyeon Han, Dahee Kwon, Gayoung Lee, Junho Kim, Jaesik Choi
PDF
Enhancing Dance-to-Music Generation via Negative Conditioning Latent Diffusion Model Changchang Sun, Gaowen Liu, Charles Fleming, Yan Yan
PDF
Enhancing Dataset Distillation via Non-Critical Region Refinement Minh-Tuan Tran, Trung Le, Xuan-May Le, Thanh-Toan Do, Dinh Phung
PDF
Enhancing Diversity for Data-Free Quantization Kai Zhao, Zhihao Zhuang, Miao Zhang, Chenjuan Guo, Yang Shu, Bin Yang
PDF
Enhancing Facial Privacy Protection via Weakening Diffusion Purification Ali Salar, Qing Liu, Yingli Tian, Guoying Zhao
PDF
Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration Yiyang Chen, Tianyu Ding, Lei Wang, Jing Huo, Yang Gao, Wenbin Li
PDF
Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization Sihao Liu, Yibo Yang, Xiaojie Li, David A. Clifton, Bernard Ghanem
PDF
Enhancing Privacy-Utility Trade-Offs to Mitigate Memorization in Diffusion Models Chen Chen, Daochang Liu, Mubarak Shah, Chang Xu
PDF
Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-Supervised Medical Image Segmentation Aishik Konwer, Zhijian Yang, Erhan Bas, Cao Xiao, Prateek Prasanna, Parminder Bhatia, Taha Kass-Hout
PDF
Enhancing Testing-Time Robustness for Trusted Multi-View Classification in the Wild Wei Liu, Yufei Chen, Xiaodong Yue
PDF
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation Yudi Shi, Shangzhe Di, Qirui Chen, Weidi Xie
PDF
Enhancing Virtual Try-on with Synthetic Pairs and Error-Aware Noise Scheduling Nannan Li, Kevin J. Shih, Bryan A. Plummer
PDF
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data Haoxin Li, Boyang Li
PDF
EnliveningGS: Active Locomotion of 3DGS Siyuan Shen, Tianjia Shao, Kun Zhou, Chenfanfu Jiang, Yin Yang
PDF
EntityErasure: Erasing Entity Cleanly via Amodal Entity Segmentation and Completion Yixing Zhu, Qing Zhang, Yitong Wang, Yongwei Nie, Wei-Shi Zheng
PDF
EntitySAM: Segment Everything in Video Mingqiao Ye, Seoung Wug Oh, Lei Ke, Joon-Young Lee
PDF
EntropyMark: Towards More Harmless Backdoor Watermark via Entropy-Based Constraint for Open-Source Dataset Copyright Protection Ming Sun, Rui Wang, Zixuan Zhu, Lihua Jing, Yuanfang Guo
PDF
EnvGS: Modeling View-Dependent Appearance with Environment Gaussian Tao Xie, Xi Chen, Zhen Xu, Yiman Xie, Yudong Jin, Yujun Shen, Sida Peng, Hujun Bao, Xiaowei Zhou
PDF
EnvPoser: Environment-Aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling Songpengcheng Xia, Yu Zhang, Zhuo Su, Xiaozheng Zheng, Zheng Lv, Guidong Wang, Yongjie Zhang, Qi Wu, Lei Chu, Ling Pei
PDF
EquiPose: Exploiting Permutation Equivariance for Relative Camera Pose Estimation Yuzhen Liu, Qiulei Dong
PDF
Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways Yi Liu, Hao Zhou, Benlei Cui, Wenxiang Shang, Ran Lin
PDF
Erasing Undesirable Influence in Diffusion Models Jing Wu, Trung Le, Munawar Hayat, Mehrtash Harandi
PDF
ERUPT: Efficient Rendering with Unposed Patch Transformer Maxim V. Shugaev, Vincent Chen, Maxim Karrenbach, Kyle Ashley, Bridget Kennedy, Naresh P. Cuntoor
PDF
ESC: Erasing Space Concept for Knowledge Deletion Tae-Young Lee, Sundong Park, Minwoo Jeon, Hyoseok Hwang, Gyeong-Moon Park
PDF
ESCAPE: Equivariant Shape Completion via Anchor Point Encoding Burak Bekci, Nassir Navab, Federico Tombari, Mahdi Saleh
PDF
Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces Souhail Hadgi, Luca Moschella, Andrea Santilli, Diego Gomez, Qixing Huang, Emanuele Rodolà, Simone Melzi, Maks Ovsjanikov
PDF
Estimating Body and Hand Motion in an Ego-Sensed World Brent Yi, Vickie Ye, Maya Zheng, Yunqi Li, Lea Müller, Georgios Pavlakos, Yi Ma, Jitendra Malik, Angjoo Kanazawa
PDF
ETAP: Event-Based Tracking of Any Point Friedhelm Hamann, Daniel Gehrig, Filbert Febryanto, Kostas Daniilidis, Guillermo Gallego
PDF
Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event Cameras Hoonhee Cho, Jae-Young Kang, Youngho Kim, Kuk-Jin Yoon
PDF
Eval3D: Interpretable and Fine-Grained Evaluation for 3D Generation Shivam Duggal, Yushi Hu, Oscar Michel, Aniruddha Kembhavi, William T. Freeman, Noah A. Smith, Ranjay Krishna, Antonio Torralba, Ali Farhadi, Wei-Chiu Ma
PDF
Evaluating Model Perception of Color Illusions in Photorealistic Scenes Lingjun Mao, Zineng Tang, Alane Suhr
PDF
Evaluating Vision-Language Models as Evaluators in Path Planning Mohamed Aghzal, Xiang Yue, Erion Plaku, Ziyu Yao
PDF
EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events Shuoyan Wei, Feng Li, Shengeng Tang, Yao Zhao, Huihui Bai
PDF
Event Ellipsometer: Event-Based Mueller-Matrix Video Imaging Ryota Maeda, Yunseong Moon, Seung-Hwan Baek
PDF
Event Fields: Capturing Light Fields at High Speed, Resolution, and Dynamic Range Ziyuan Qu, Zihao Zou, Vivek Boominathan, Praneeth Chakravarthula, Adithya Pediredla
PDF
Event-Based Video Super-Resolution via State Space Models Zeyu Xiao, Xinchao Wang
PDF
Event-Equalized Dense Video Captioning Kangyi Wu, Pengna Li, Jingwen Fu, Yizhe Li, Yang Wu, Yuhan Liu, Jinjun Wang, Sanping Zhou
PDF
EventFly: Event Camera Perception from Ground to the Sky Lingdong Kong, Dongyue Lu, Xiang Xu, Lai Xing Ng, Wei Tsang Ooi, Benoit R. Cottereau
PDF
EventGPT: Event Stream Understanding with Multimodal Large Language Models Shaoyu Liu, Jianing Li, Guanghui Zhao, Yunjian Zhang, Xin Meng, Fei Richard Yu, Xiangyang Ji, Ming Li
PDF
EventPSR: Surface Normal and Reflectance Estimation from Photometric Stereo Using an Event Camera Bohan Yu, Jin Han, Boxin Shi, Imari Sato
PDF
EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-Time Rendering Toshiya Yura, Ashkan Mirzaei, Igor Gilitschenski
PDF
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond Guanyao Wu, Haoyu Liu, Hongming Fu, Yichuan Peng, Jinyuan Liu, Xin Fan, Risheng Liu
PDF
Everything to the Synthetic: Diffusion-Driven Test-Time Adaptation via Synthetic-Domain Alignment Jiayi Guo, Junhao Zhao, Chaoqun Du, Yulin Wang, Chunjiang Ge, Zanlin Ni, Shiji Song, Humphrey Shi, Gao Huang
PDF
EvOcc: Accurate Semantic Occupancy for Automated Driving Using Evidence Theory Jonas Kälble, Sascha Wirges, Maxim Tatarchenko, Eddy Ilg
PDF
EVolSplat: Efficient Volume-Based Gaussian Splatting for Urban View Synthesis Sheng Miao, Jiaxin Huang, Dongfeng Bai, Xu Yan, Hongyu Zhou, Yue Wang, Bingbing Liu, Andreas Geiger, Yiyi Liao
PDF
Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization You Shen, Zhipeng Zhang, Xinyang Li, Yansong Qu, Yu Lin, Shengchuan Zhang, Liujuan Cao
PDF
EVOS: Efficient Implicit Neural Training via EVOlutionary Selector Weixiang Zhang, Shuzhao Xie, Chengwei Ren, Siyi Xie, Chen Tang, Shijia Ge, Mingzi Wang, Zhi Wang
PDF
EVPGS: Enhanced View Prior Guidance for Splatting-Based Extrapolated View Synthesis Jiahe Li, Feiyu Wang, Xiaochao Qu, Chengjing Wu, Luoqi Liu, Ting Liu
PDF
Exact: Exploring Space-Time Perceptive Clues for Weakly Supervised Satellite Image Time Series Semantic Segmentation Hao Zhu, Yan Zhu, Jiayu Xiao, Tianxiang Xiao, Yike Ma, Yucheng Zhang, Feng Dai
PDF
ExpertAF: Expert Actionable Feedback from Video Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos, Kris Kitani, Kristen Grauman
PDF
Explainable Saliency: Articulating Reasoning with Contextual Prioritization Nuo Chen, Ming Jiang, Qi Zhao
PDF
Explaining Domain Shifts in Language: Concept Erasing for Interpretable Image Classification Zequn Zeng, Yudi Su, Jianqiao Sun, Tiansheng Wen, Hao Zhang, Zhengjue Wang, Bo Chen, Hongwei Liu, Jiawei Ma
PDF
Explaining in Diffusion: Explaining a Classifier with Diffusion Semantics Tahira Kazimi, Ritika Allada, Pinar Yanardag
PDF
Explicit Depth-Aware Blurry Video Frame Interpolation Guided by Differential Curves Zaoming Yan, Pengcheng Lei, Tingting Wang, Faming Fang, Junkang Zhang, Yaomin Huang, Haichuan Song
PDF
Exploiting Deblurring Networks for Radiance Fields Haeyun Choi, Heemin Yang, Janghyeok Han, Sunghyun Cho
PDF
Exploiting Temporal State Space Sharing for Video Semantic Segmentation Syed Ariff Syed Hesham, Yun Liu, Guolei Sun, Henghui Ding, Jing Yang, Ender Konukoglu, Xue Geng, Xudong Jiang
PDF
Exploration-Driven Generative Interactive Environments Nedko Savov, Naser Kazemi, Mohammad Mahdi, Danda Pani Paudel, Xi Wang, Luc Van Gool
PDF
Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation Zhiwei Yang, Yucong Meng, Kexue Fu, Feilong Tang, Shuo Wang, Zhijian Song
PDF
Exploring Contextual Attribute Density in Referring Expression Counting Zhicheng Wang, Zhiyu Pan, Zhan Peng, Jian Cheng, Liwen Xiao, Wei Jiang, Zhiguo Cao
PDF
Exploring Historical Information for RGBE Visual Tracking with Mamba Chuanyu Sun, Jiqing Zhang, Yang Wang, Huilin Ge, Qianchen Xia, Baocai Yin, Xin Yang
PDF
Exploring Intrinsic Normal Prototypes Within a Single Image for Universal Anomaly Detection Wei Luo, Yunkang Cao, Haiming Yao, Xiaotian Zhang, Jianan Lou, Yuqi Cheng, Weiming Shen, Wenyong Yu
PDF
Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation Chuandong Liu, Xingxing Weng, Shuguo Jiang, Pengcheng Li, Lei Yu, Gui-Song Xia
PDF
Exploring Semantic Feature Discrimination for Perceptual Image Super-Resolution and Opinion-Unaware No-Reference Image Quality Assessment Guanglu Dong, Xiangyu Liao, Mingyang Li, Guihuan Guo, Chao Ren
PDF
Exploring Simple Open-Vocabulary Semantic Segmentation Zihang Lai
PDF
Exploring Sparse MoE in GANs for Text-Conditioned Image Synthesis Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng, Yinghao Xu, Zifan Shi, Yifei Zhang, Qifeng Chen, Yujun Shen
PDF
Exploring Temporally-Aware Features for Point Tracking Inès Hyeonsu Kim, Seokju Cho, Jiahui Huang, Jung Yi, Joon-Young Lee, Seungryong Kim
PDF
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis Bingda Tang, Boyang Zheng, Sayak Paul, Saining Xie
PDF
Exploring Timeline Control for Facial Motion Generation Yifeng Ma, Jinwei Qi, Chaonan Ji, Peng Zhang, Bang Zhang, Zhidong Deng, Liefeng Bo
PDF
Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models Shuyang Hao, Bryan Hooi, Jun Liu, Kai-Wei Chang, Zi Huang, Yujun Cai
PDF
Exposure-Slot: Exposure-Centric Representations Learning with Slot-in-Slot Attention for Region-Aware Exposure Correction Donggoo Jung, Daehyun Kim, Guanghui Wang, Tae Hyun Kim
PDF
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling Is Easier than You Think Jie Tian, Xiaoye Qu, Zhenyi Lu, Wei Wei, Sichen Liu, Yu Cheng
PDF
Extreme Rotation Estimation in the Wild Hana Bezalel, Dotan Ankri, Ruojin Cai, Hadar Averbach-Elor
PDF
EZSR: Event-Based Zero-Shot Recognition Yan Yang, Liyuan Pan, Dongxu Li, Liu Liu
PDF
F-LMM: Grounding Frozen Large Multimodal Models Size Wu, Sheng Jin, Wenwei Zhang, Lumin Xu, Wentao Liu, Wei Li, Chen Change Loy
PDF
F^3OCUS - Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-Objective Meta-Heuristics Pramit Saha, Felix Wagner, Divyanshu Mishra, Can Peng, Anshul Thakur, David A. Clifton, Konstantinos Kamnitsas, J. Alison Noble
PDF
Face Forgery Video Detection via Temporal Forgery Cue Unraveling Zonghui Guo, Yingjie Liu, Jie Zhang, Haiyong Zheng, Shiguang Shan
PDF
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs Xiaoqin Wang, Xusen Ma, Xianxu Hou, Meidan Ding, Yudong Li, Junliang Chen, Wenting Chen, Xiaoyang Peng, Linlin Shen
PDF
FactCheXcker: Mitigating Measurement Hallucinations in Chest X-Ray Report Generation Models Alice Heiman, Xiaoman Zhang, Emma Chen, Sung Eun Kim, Pranav Rajpurkar
PDF
Factored-NeuS: Reconstructing Surfaces, Illumination, and Materials of Possibly Glossy Objects Yue Fan, Ningjing Fan, Ivan Skorokhodov, Oleg Voynov, Savva Ignatyev, Evgeny Burnaev, Peter Wonka, Yiqun Wang
PDF
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation Tianyun Zhong, Chao Liang, Jianwen Jiang, Gaojie Lin, Jiaqi Yang, Zhou Zhao
PDF
FADE: Frequency-Aware Diffusion Model Factorization for Video Editing Yixuan Zhu, Haolin Wang, Shilin Ma, Wenliang Zhao, Yansong Tang, Lei Chen, Jie Zhou
PDF
FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-Resolution Junyang Chen, Jinshan Pan, Jiangxin Dong
PDF
FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding Thanh-Dat Truong, Utsav Prabhu, Bhiksha Raj, Jackson Cothren, Khoa Luu
PDF
FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion Haosen Yang, Adrian Bulat, Isma Hadji, Hai X. Pham, Xiatian Zhu, Georgios Tzimiropoulos, Brais Martinez
PDF
Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation Qiao Yu, Xianzhi Li, Yuan Tang, Xu Han, Long Hu, Yixue Hao, Min Chen
PDF
Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning Jiuyang Dong, Junjun Jiang, Kui Jiang, Jiahan Li, Yongbing Zhang
PDF
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli
PDF
Faster Parameter-Efficient Tuning with Token Redundancy Reduction Kwonyoung Kim, Jungin Park, Jin Kim, Hyeongjun Kwon, Kwanghoon Sohn
PDF
FASTer: Focal Token Acquiring-and-Scaling Transformer for Long-Term 3D Objection Detection Chenxu Dang, ZaiPeng Duan, Pei An, Xinmin Zhang, Xuzhong Hu, Jie Ma
PDF
FastVLM: Efficient Vision Encoding for Vision Language Models Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate true, Albert Antony, Gokula Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pouransari
PDF
FATE: Full-Head Gaussian Avatar with Textural Editing from Monocular Video Jiawei Zhang, Zijian Wu, Zhiyang Liang, Yicheng Gong, Dongfang Hu, Yao Yao, Xun Cao, Hao Zhu
PDF
FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing Yufan Ren, Zicong Jiang, Tong Zhang, Søren Forchhammer, Sabine Süsstrunk
PDF
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting Yue Chen, Xingyu Chen, Anpei Chen, Gerard Pons-Moll, Yuliang Xiu
PDF
Feature Information Driven Position Gaussian Distribution Estimation for Tiny Object Detection Jinghao Bian, Mingtao Feng, Weisheng Dong, Fangfang Wu, Jianqiao Luo, Yaonan Wang, Guangming Shi
PDF
Feature Selection for Latent Factor Models Rittwika Kansabanik, Adrian Barbu
PDF
Feature Spectrum Learning for Remote Sensing Change Detection Qi Zang, Dong Zhao, Shuang Wang, Dou Quan, Zhun Zhong
PDF
Feature-Preserving Mesh Decimation for Normal Integration Moritz Heep, Sven Behnke, Eduard Zell
PDF
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields Shijie Zhou, Hui Ren, Yijia Weng, Shuwang Zhang, Zhen Wang, Dejia Xu, Zhiwen Fan, Suya You, Zhangyang Wang, Leonidas Guibas, Achuta Kadambi
PDF
FedAWA: Adaptive Optimization of Aggregation Weights in Federated Learning Using Client Vectors Changlong Shi, He Zhao, Bingjie Zhang, Mingyuan Zhou, Dandan Guo, Yi Chang
PDF
FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models Haokun Chen, Hang Li, Yao Zhang, Jinhe Bi, Gengyuan Zhang, Yueqi Zhang, Philip Torr, Jindong Gu, Denis Krompass, Volker Tresp
PDF
FedCALM: Conflict-Aware Layer-Wise Mitigation for Selective Aggregation in Deeper Personalized Federated Learning Hao Zheng, Zhigang Hu, Liu Yang, Meiguang Zheng, Aikun Xu, Boyu Wang
PDF
FedCS: Coreset Selection for Federated Learning Chenhe Hao, Weiying Xie, Daixun Li, Haonan Qin, Hangyu Ye, Leyuan Fang, Yunsong Li
PDF
Federated Learning with Domain Shift Eraser Zheng Wang, Zihui Wang, Zheng Wang, Xiaoliang Fan, Cheng Wang
PDF
FedMIA: An Effective Membership Inference Attack Exploiting "All for One" Principle in Federated Learning Gongxi Zhu, Donghao Li, Hanlin Gu, Yuan Yao, Lixin Fan, Yuxing Han
PDF
FedSPA: Generalizable Federated Graph Learning Under Homophily Heterogeneity Zihan Tan, Guancheng Wan, Wenke Huang, He Li, Guibin Zhang, Carl Yang, Mang Ye
PDF
FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation Fengyi Fu, Lei Zhang, Mengqi Huang, Zhendong Mao
PDF
Ferret: An Efficient Online Continual Learning Framework Under Varying Memory Constraints Yuhao Zhou, Yuxin Tian, Jindi Lv, Mingjia Shi, Yuanxi Li, Qing Ye, Shuhao Zhang, Jiancheng Lv
PDF
Few-Shot Implicit Function Generation via Equivariance Suizhi Huang, Xingyi Yang, Hongtao Lu, Xinchao Wang
PDF
Few-Shot Personalized Scanpath Prediction Ruoyu Xue, Jingyi Xu, Sounak Mondal, Hieu Le, Greg Zelinsky, Minh Hoai, Dimitris Samaras
PDF
Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning Tian Liu, Huixin Zhang, Shubham Parashar, Shu Kong
PDF
FFaceNeRF: Few-Shot Face Editing in Neural Radiance Fields Kwan Yun, Chaelin Kim, Hangyeul Shin, Junyong Noh
PDF
FFR: Frequency Feature Rectification for Weakly Supervised Semantic Segmentation Ziqian Yang, Xinqiao Zhao, Xiaolei Wang, Quan Zhang, Jimin Xiao
PDF
FG^2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching Zimin Xia, Alexandre Alahi
PDF
FIction: 4D Future Interaction Prediction from Video Kumar Ashutosh, Georgios Pavlakos, Kristen Grauman
PDF
FIFA: Fine-Grained Inter-Frame Attention for Driver's Video Gaze Estimation Daosong Hu, Mingyue Cui, Kai Huang
PDF
FilmComposer: LLM-Driven Music Production for Silent Film Clips Zhifeng Xie, Qile He, Youjia Zhu, Qiwei He, Mengtian Li
PDF
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning Bardia Safaei, Faizan Siddiqui, Jiacong Xu, Vishal M. Patel, Shao-Yuan Lo
PDF
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation Zhuguanyu Wu, Shihe Wang, Jiayi Zhang, Jiaxin Chen, Yunhong Wang
PDF
Finding Local Diffusion Schrodinger Bridge Using Kolmogorov-Arnold Network Xingyu Qiu, Mengying Yang, Xinghua Ma, Fanding Li, Dong Liang, Gongning Luo, Wei Wang, Kuanquan Wang, Shuo Li
PDF
Fine-Grained Erasure in Text-to-Image Diffusion-Based Foundation Models Kartik Thakral, Tamar Glaser, Tal Hassner, Mayank Vatsa, Richa Singh
PDF
Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation Jiho Choi, Seonho Lee, Minhyun Lee, Seungho Lee, Hyunjung Shim
PDF
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity Hang Hua, Qing Liu, Lingzhi Zhang, Jing Shi, Soo Ye Kim, Zhifei Zhang, Yilin Wang, Jianming Zhang, Zhe Lin, Jiebo Luo
PDF
FineLIP: Extending CLIP's Reach via Fine-Grained Alignment with Longer Text Inputs Mothilal Asokan, Kebin Wu, Fatima Albreiki
PDF
FinePhys: Fine-Grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance Dian Shao, Mingfei Shi, Shengda Xu, Haodong Chen, Yongle Huang, Binglu Wang
PDF
Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation Ziheng Zhang, Jianyang Gu, Arpita Chowdhury, Zheda Mai, David Carlyn, Tanya Berger-Wolf, Yu Su, Wei-Lun Chao
PDF
FineVQ: Fine-Grained User Generated Content Video Quality Assessment Huiyu Duan, Qiang Hu, Jiarui Wang, Liu Yang, Zitong Xu, Lu Liu, Xiongkuo Min, Chunlei Cai, Tianxiao Ye, Xiaoyun Zhang, Guangtao Zhai
PDF
Fingerprinting Denoising Diffusion Probabilistic Models Huan Teng, Yuhui Quan, Chengyu Wang, Jun Huang, Hui Ji
PDF
Finsler Multi-Dimensional Scaling: Manifold Learning for Asymmetric Dimensionality Reduction and Embedding Thomas Dagès, Simon Weber, Ya-Wei Eileen Lin, Ronen Talmon, Daniel Cremers, Michael Lindenbaum, Alfred M. Bruckstein, Ron Kimmel
PDF
FiRe: Fixed-Points of Restoration Priors for Solving Inverse Problems Matthieu Terris, Ulugbek S. Kamilov, Thomas Moreau
PDF
FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error Beilin Chu, Xuan Xu, Xin Wang, Yufei Zhang, Weike You, Linna Zhou
PDF
FireEdit: Fine-Grained Instruction-Based Image Editing via Region-Aware Vision Language Model Jun Zhou, Jiahao Li, Zunnan Xu, Hanhui Li, Yiji Cheng, Fa-Ting Hong, Qin Lin, Qinglin Lu, Xiaodan Liang
PDF
FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement Ian Huang, Yanan Bao, Karen Truong, Howard Zhou, Cordelia Schmid, Leonidas Guibas, Alireza Fathi
PDF
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images Kazi Sajeed Mehrab, M. Maruf, Arka Daw, Abhilash Neog, Harish Babu Manogaran, Mridul Khurana, Zhenyang Feng, Bahadir Altintas, Yasin Bakis, Elizabeth G Campolongo, Matthew J Thompson, Xiaojun Wang, Hilmar Lapp, Tanya Berger-Wolf, Paula Mabee, Henry Bart, Wei-Lun Chao, Wasila M Dahdul, Anuj Karpatne
PDF
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation Dong Zhao, Jinlong Li, Shuang Wang, Mengyao Wu, Qi Zang, Nicu Sebe, Zhun Zhong
PDF
Fitted Neural Lossless Image Compression Zhe Zhang, Zhenzhong Chen, Shan Liu
PDF
FLAIR: VLM with Fine-Grained Language-Informed Image Representations Rui Xiao, Sanghwan Kim, Mariana-Iuliana Georgescu, Zeynep Akata, Stephan Alaniz
PDF
FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-Training Anjia Cao, Xing Wei, Zhiheng Ma
PDF
FLARE: Feed-Forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views Shangzhan Zhang, Jianyuan Wang, Yinghao Xu, Nan Xue, Christian Rupprecht, Xiaowei Zhou, Yujun Shen, Gordon Wetzstein
PDF
Flash-Split: 2D Reflection Removal with Flash Cues and Latent Diffusion Separation Tianfu Wang, Mingyang Xie, Haoming Cai, Sachin Shah, Christopher A. Metzler
PDF
Flash3D: Super-Scaling Point Transformers Through Joint Hardware-Geometry Locality Liyan Chen, Gregory P. Meyer, Zaiwei Zhang, Eric M. Wolff, Paul Vernaza
PDF
FlashGS: Efficient 3D Gaussian Splatting for Large-Scale and High-Resolution Rendering Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Boni Hu, Linning Xu, Zhilin Pei, Hengjie Li, Xiuhong Li, Ninghui Sun, Xingcheng Zhang, Bo Dai
PDF
FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression Bo Tong, Bokai Lai, Yiyi Zhou, Gen Luo, Yunhang Shen, Ke Li, Xiaoshuai Sun, Rongrong Ji
PDF
FLAVC: Learned Video Compression with Feature Level Attention Chun Zhang, Heming Sun, Jiro Katto
PDF
FlexDrive: Toward Trajectory Flexibility in Driving Scene Gaussian Splatting Reconstruction and Rendering Jingqiu Zhou, Lue Fan, Linjiang Huang, Xiaoyu Shi, Si Liu, Zhaoxiang Zhang, Hongsheng Li
PDF
FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting Hengyu Liu, Yuehao Wang, Chenxin Li, Ruisi Cai, Kevin Wang, Wuyang Li, Pavlo Molchanov, Peihao Wang, Zhangyang Wang
PDF
Flexible Frame Selection for Efficient Video Reasoning Shyamal Buch, Arsha Nagrani, Anurag Arnab, Cordelia Schmid
PDF
Flexible Group Count Enables Hassle-Free Structured Pruning Jiamu Zhang, Shaochen Zhong, Andrew Ye, Zirui Liu, Sebastian Zhao, Kaixiong Zhou, Li Li, Soo-Hyun Choi, Rui Chen, Xia Hu, Shuai Xu, Vipin Chaudhary
PDF
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Sotiris Anagnostidis, Gregor Bachmann, Yeongmin Kim, Jonas Kohler, Markos Georgopoulos, Artsiom Sanakoyeu, Yuming Du, Albert Pumarola, Ali Thabet, Edgar Schönfeld
PDF
FlexUOD: The Answer to Real-World Unsupervised Image Outlier Detection Zhonghang Liu, Kun Zhou, Changshuo Wang, Wen-Yan Lin, Jiangbo Lu
PDF
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations Hmrishav Bandyopadhyay, Yi-Zhe Song
PDF
Floating No More: Object-Ground Reconstruction from a Single Image Yunze Man, Yichen Sheng, Jianming Zhang, Liang-Yan Gui, Yu-Xiong Wang
PDF
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Jiuhai Chen, Jianwei Yang, Haiping Wu, Dianqi Li, Jianfeng Gao, Tianyi Zhou, Bin Xiao
PDF
FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis Wonjoon Jin, Qi Dai, Chong Luo, Seung-Hwan Baek, Sunghyun Cho
PDF
Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow Within Unified Neural Representations Xunzhi Zheng, Dan Xu
PDF
Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution Qihao Liu, Xi Yin, Alan Yuille, Andrew Brown, Mannat Singh
PDF
FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation Sen Wang, Le Wang, Sanping Zhou, Jingyi Tian, Jiayi Li, Haowen Sun, Wei Tang
PDF
Floxels: Fast Unsupervised Voxel Based Scene Flow Estimation David T. Hoffmann, Syed Haseeb Raza, Hanqiu Jiang, Denis Tananaev, Steffen Klingenhoefer, Martin Meinke
PDF
FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video Yue Gao, Hong-Xing Yu, Bo Zhu, Jiajun Wu
PDF
FluxSpace: Disentangled Semantic Editing in Rectified Flow Models Yusuf Dalva, Kavana Venkatesh, Pinar Yanardag
PDF
Focal Split: Untethered Snapshot Depth from Differential Defocus Junjie Luo, John Mamish, Alan Fu, Thomas Concannon, Josiah Hester, Emma Alexander, Qi Guo
PDF
Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation Xiaoying Xing, Avinab Saha, Junfeng He, Susan Hao, Paul Vicol, Moonkyung Ryu, Gang Li, Sahil Singla, Sarah Young, Yinxiao Li, Feng Yang, Deepak Ramachandran
PDF
FOCUS: Knowledge-Enhanced Adaptive Visual Compression for Few-Shot Whole Slide Image Classification Zhengrui Guo, Conghao Xiong, Jiabo Ma, Qichen Sun, Lishuang Feng, Jinzhuo Wang, Hao Chen
PDF
Focusing on Tracks for Online Multi-Object Tracking Kyujin Shim, Kangwook Ko, Yujin Yang, Changick Kim
PDF
Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows Shentong Mo, Yibing Song
PDF
Font-Agent: Enhancing Font Understanding with Large Language Models Yingxin Lai, Cuijie Xu, Haitian Shi, Guoqing Yang, Xiaoning Li, Zhiming Luo, Shaozi Li
PDF
Forensic Self-Descriptions Are All You Need for Zero-Shot Detection, Open-Set Source Attribution, and Clustering of AI-Generated Images Tai D. Nguyen, Aref Azizpour, Matthew C. Stamm
PDF
Forensics Adapter: Adapting CLIP for Generalizable Face Forgery Detection Xinjie Cui, Yuezun Li, Ao Luo, Jiaran Zhou, Junyu Dong
PDF
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models Jin Wang, Chenghui Lv, Xian Li, Shichao Dong, Huadong Li, Kelu Yao, Chao Li, Wenqi Shao, Ping Luo
PDF
ForestLPR: LiDAR Place Recognition in Forests Attentioning Multiple BEV Density Images Yanqing Shen, Turcan Tuna, Marco Hutter, Cesar Cadena, Nanning Zheng
PDF
Forming Auxiliary High-Confident Instance-Level Loss to Promote Learning from Label Proportions Tianhao Ma, Han Chen, Juncheng Hu, Yungang Zhu, Ximing Li
PDF
Fortifying Federated Learning Towards Trustworthiness via Auditable Data Valuation and Verifiable Client Contribution K Naveen Kumar, Ranjeet Ranjan Jha, C Krishna Mohan, Ravindra Babu Tallamraju
PDF
Foundations of the Theory of Performance-Based Ranking Sébastien Piérard, Anaïs Halin, Anthony Cioppa, Adrien Deliege, Marc Van Droogenbroeck
PDF
FoundationStereo: Zero-Shot Stereo Matching Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, Stan Birchfield
PDF
FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation Kefan Chen, Chaerin Min, Linguang Zhang, Shreyas Hampali, Cem Keskin, Srinath Sridhar
PDF
Foveated Instance Segmentation Hongyi Zeng, Wenxuan Liu, Tianhua Xia, Jinhui Chen, Ziyun Li, Sai Qian Zhang
PDF
Fractal Calibration for Long-Tailed Object Detection Konstantinos Panagiotis Alexandridis, Ismail Elezi, Jiankang Deng, Anh Nguyen, Shan Luo
PDF
FRAME: Floor-Aligned Representation for Avatar Motion from Egocentric Video Andrea Boscolo Camiletto, Jian Wang, Eduardo Alvarado, Rishabh Dabral, Thabo Beeler, Marc Habermann, Christian Theobalt
PDF
FRAMES-VQA: Benchmarking Fine-Tuning Robustness Across Multi-Modal Shifts in Visual Question Answering Chengyue Huang, Brisa Maneechotesuwan, Shivang Chopra, Zsolt Kira
PDF
Free Lunch Enhancements for Multi-Modal Crowd Counting Haoliang Meng, Xiaopeng Hong, Zhengqin Lai, Miao Shang
PDF
Free on the Fly: Enhancing Flexibility in Test-Time Adaptation with Online EM Qiyuan Dai, Sibei Yang
PDF
Free-Viewpoint Human Animation with Pose-Correlated Reference Selection Fa-Ting Hong, Zhan Xu, Haiyang Liu, Qinjie Lin, Luchuan Song, Zhixin Shu, Yang Zhou, Duygu Ceylan, Dan Xu
PDF
Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views Chong Bao, Xiyu Zhang, Zehao Yu, Jiale Shi, Guofeng Zhang, Songyou Peng, Zhaopeng Cui
PDF
FreeCloth: Free-Form Generation Enhances Challenging Clothed Human Modeling Hang Ye, Xiaoxuan Ma, Hai Ci, Wentao Zhu, Yizhou Wang
PDF
FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity Jinxi Li, Ziyang Song, Siyuan Zhou, Bo Yang
PDF
FreePCA: Integrating Consistency Information Across Long-Short Frames in Training-Free Long Video Generation via Principal Component Analysis Jiangtong Tan, Hu Yu, Jie Huang, Jie Xiao, Feng Zhao
PDF
FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts Tongyuan Bai, Wangyuanfan Bai, Dong Chen, Tieru Wu, Manyi Li, Rui Ma
PDF
FreeSim: Toward Free-Viewpoint Camera Simulation in Driving Scenes Lue Fan, Hao Zhang, Qitai Wang, Hongsheng Li, Zhaoxiang Zhang
PDF
FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction Yifan Wang, Peishan Yang, Zhen Xu, Jiaming Sun, Zhanhua Zhang, Yong Chen, Hujun Bao, Sida Peng, Xiaowei Zhou
PDF
FreeUV: Ground-Truth-Free Realistic Facial UV Texture Recovery via Cross-Assembly Inference Strategy Xingchao Yang, Takafumi Taketomi, Yuki Endo, Yoshihiro Kanamori
PDF
FreqDebias: Towards Generalizable Deepfake Detection via Consistency-Driven Frequency Debiasing Hossein Kashiani, Niloufar Alipour Talemi, Fatemeh Afghah
PDF
Frequency Dynamic Convolution for Dense Image Prediction Linwei Chen, Lin Gu, Liang Li, Chenggang Yan, Ying Fu
PDF
Frequency-Biased Synergistic Design for Image Compression and Compensation Jiaming Liu, Qi Zheng, Zihao Liu, Yilian Zhong, Peiye Liu, Tao Liu, Shusong Xu, Yanheng Lu, Sicheng Li, Dimin Niu, Yibo Fan
PDF
FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images Rong Wang, Fabian Prada, Ziyan Wang, Zhongshi Jiang, Chengxiang Yin, Junxuan Li, Shunsuke Saito, Igor Santesteban, Javier Romero, Rohan Joshi, Hongdong Li, Jason Saragih, Yaser Sheikh
PDF
From AlexNet to Transformers: Measuring the Non-Linearity of Deep Neural Networks with Affine Optimal Transport Quentin Bouniot, Ievgen Redko, Anton Mallasto, Charlotte Laclau, Oliver Struckmeier, Karol Arndt, Markus Heinonen, Ville Kyrki, Samuel Kaski
PDF
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition Jiawei Lin, Shizhao Sun, Danqing Huang, Ting Liu, Ji Li, Jiang Bian
PDF
From Faces to Voices: Learning Hierarchical Representations for High-Quality Video-to-Speech Ji-Hoon Kim, Jeongsoo Choi, Jaehun Kim, Chaeyoung Jung, Joon Son Chung
PDF
From Head to Tail: Efficient Black-Box Model Inversion Attack via Long-Tailed Learning Ziang Li, Hongguang Zhang, Juan Wang, Meihui Chen, Hongxin Hu, Wenzhe Yi, Xiaoyang Xu, Mengda Yang, Chenjun Ma
PDF
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models Through Adaptive Data Calibration Mingyang Song, Xiaoye Qu, Jiawei Zhou, Yu Cheng
PDF
From Laboratory to Real World: A New Benchmark Towards Privacy-Preserved Visible-Infrared Person Re-Identification Yan Jiang, Hao Yu, Xu Cheng, Haoyu Chen, Zhaodong Sun, Guoying Zhao
PDF
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons Andrew Szot, Bogdan Mazoure, Omar Attia, Aleksei Timofeev, Harsh Agrawal, Devon Hjelm, Zhe Gan, Zsolt Kira, Alexander Toshev
PDF
From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization Chao Yuan, Guiwei Zhang, Changxiao Ma, Tianyi Zhang, Guanglin Niu
PDF
From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling Jinhong Lin, Cheng-En Wu, Huanran Li, Jifan Zhang, Yu Hen Hu, Pedro Morgado
PDF
From Slow Bidirectional to Fast Autoregressive Video Diffusion Models Tianwei Yin, Qiang Zhang, Richard Zhang, William T. Freeman, Fredo Durand, Eli Shechtman, Xun Huang
PDF
From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models German Barquero, Nadine Bertsch, Manojkumar Marramreddy, Carlos Chacón, Filippo Arcadu, Ferran Rigual, Nicky Sijia He, Cristina Palmero, Sergio Escalera, Yuting Ye, Robin Kips
PDF
From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting Zhiwei Huang, Hailin Yu, Yichun Shentu, Jin Yuan, Guofeng Zhang
PDF
From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing Jingxuan Wei, Cheng Tan, Qi Chen, Gaowei Wu, Siyuan Li, Zhangyang Gao, Linzhuang Sun, Bihui Yu, Ruifeng Guo
PDF
From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective Chen Zhao, Zhizhou Chen, Yunzhe Xu, Enxuan Gu, Jian Li, Zili Yi, Qian Wang, Jian Yang, Ying Tai
PDF
FrugalNeRF: Fast Convergence for Extreme Few-Shot Novel View Synthesis Without Learned Priors Chin-Yang Lin, Chung-Ho Wu, Chang-Han Yeh, Shih-Han Yen, Cheng Sun, Yu-Lun Liu
PDF
FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting Fangyu Wu, Yuhao Chen
PDF
FSBench: A Figure Skating Benchmark for Advancing Artistic Sports Understanding Rong Gao, Xin Liu, Zhuozhao Hu, Bohao Xing, Baiqiang Xia, Zitong Yu, Heikki Kälviäinen
PDF
FSboard: Over 3 Million Characters of ASL Fingerspelling Collected via Smartphones Manfred Georg, Garrett Tanzer, Esha Uboweja, Saad Hassan, Maximus Shengelia, Sam Sepah, Sean Forbes, Thad Starner
PDF
FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning Gaojian Wang, Feng Lin, Tong Wu, Zhenguang Liu, Zhongjie Ba, Kui Ren
PDF
FSHNet: Fully Sparse Hybrid Network for 3D Object Detection Shuai Liu, Mingyue Cui, Boyang Li, Quanmin Liang, Tinghe Hong, Kai Huang, Yunxiao Shan, Kai Huang
PDF
Full-DoF Egomotion Estimation for Event Cameras Using Geometric Solvers Ji Zhao, Banglei Guan, Zibin Liu, Laurent Kneip
PDF
Functionality Understanding and Segmentation in 3D Scenes Jaime Corsetti, Francesco Giuliari, Alice Fasoli, Davide Boscaini, Fabio Poiesi
PDF
Fuzzy Multimodal Learning for Trusted Cross-Modal Retrieval Siyuan Duan, Yuan Sun, Dezhong Peng, Zheng Liu, Xiaomin Song, Peng Hu
PDF
g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks Zihan Wang, Gim Hee Lee
PDF
G3Flow: Generative 3D Semantic Flow for Pose-Aware and Generalizable Object Manipulation Tianxing Chen, Yao Mu, Zhixuan Liang, Zanxin Chen, Shijia Peng, Qiangyu Chen, Mingkun Xu, Ruizhen Hu, Hongyuan Zhang, Xuelong Li, Ping Luo
PDF
GA3CE: Unconstrained 3D Gaze Estimation with Gaze-Aware 3D Context Encoding Yuki Kawana, Shintaro Shiba, Quan Kong, Norimasa Kobori
PDF
GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-View Diffusion Jiapeng Tang, Davide Davoli, Tobias Kirschstein, Liam Schoneveld, Matthias Nießner
PDF
Gain from Neighbors: Boosting Model Robustness in the Wild via Adversarial Perturbations Toward Neighboring Classes Zhou Yang, Mingtao Feng, Tao Huang, Fangfang Wu, Weisheng Dong, Xin Li, Guangming Shi
PDF
Galaxy Walker: Geometry-Aware VLMs for Galaxy-Scale Understanding Tianyu Chen, Xingcheng Fu, Yisen Gao, Haodong Qian, Yuecen Wei, Kun Yan, Haoyi Zhou, Jianxin Li
PDF
GaPT-DAR: Category-Level Garments Pose Tracking via Integrated 2D Deformation and 3D Reconstruction Li Zhang, Mingliang Xu, Jianan Wang, Qiaojun Yu, Lixin Yang, Yonglu Li, Cewu Lu, Rujing Wang, Liu Liu
PDF
GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation Ruihai Wu, Ziyu Zhu, Yuran Wang, Yue Chen, Jiarui Wang, Hao Dong
PDF
GASP: Gaussian Avatars with Synthetic Priors Jack Saunders, Charlie Hewitt, Yanan Jian, Marek Kowalski, Tadas Baltrusaitis, Yiye Chen, Darren Cosker, Virginia Estellers, Nicholas Gydé, Vinay P. Namboodiri, Benjamin E. Lundell
PDF
GauCho: Gaussian Distributions with Cholesky Decomposition for Oriented Object Detection José Henrique Lima Marques, Jeffri Murrugarra-Llerena, Claudio R. Jung
PDF
GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping Jinfeng Liu, Lingtong Kong, Bo Li, Dan Xu
PDF
Gaussian Eigen Models for Human Heads Wojciech Zielonka, Timo Bolkart, Thabo Beeler, Justus Thies
PDF
Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering Yutao Feng, Xiang Feng, Yintong Shang, Ying Jiang, Chang Yu, Zeshun Zong, Tianjia Shao, Hongzhi Wu, Kun Zhou, Chenfanfu Jiang, Yin Yang
PDF
Gaussian Splatting Feature Fields for (Privacy-Preserving) Visual Localization Maxime Pietrantoni, Gabriela Csurka, Torsten Sattler
PDF
Gaussian Splatting for Efficient Satellite Image Photogrammetry Luca Savant Aira, Gabriele Facciolo, Thibaud Ehret
PDF
GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng, Yunpeng Zhang, Dalong Du, Jiwen Lu
PDF
GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior Zichen Tang, Yuan Yao, Miaomiao Cui, Liefeng Bo, Hongyu Yang
PDF
GaussianSpa: An "Optimizing-Sparsifying" Simplification Framework for Compact and High-Quality 3D Gaussian Splatting Yangming Zhang, Wenqi Jia, Wei Niu, Miao Yin
PDF
GaussianUDF: Inferring Unsigned Distance Functions Through 3D Gaussian Splatting Shujuan Li, Yu-Shen Liu, Zhizhong Han
PDF
GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang, Jie Zhou, Jiwen Lu
PDF
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding Haoyi Jiang, Liu Liu, Tianheng Cheng, Xinjie Wang, Tianwei Lin, Zhizhong Su, Wenyu Liu, Xinggang Wang
PDF
GauSTAR: Gaussian Surface Tracking and Reconstruction Chengwei Zheng, Lixin Xue, Juan Zarate, Jie Song
PDF
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders Fiona Ryan, Ajay Bati, Sangmin Lee, Daniel Bolya, Judy Hoffman, James M. Rehg
PDF
GazeGene: Large-Scale Synthetic Gaze Dataset with 3D Eyeball Annotations Yiwei Bao, Zhiming Wang, Feng Lu
PDF
Gazing at Rewards: Eye Movements as a Lens into Human and AI Decision-Making in Hybrid Visual Foraging Bo Wang, Dingwei Tan, Yen-Ling Kuo, Zhaowei Sun, Jeremy M. Wolfe, Tat-Jen Cham, Mengmi Zhang
PDF
Gazing into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities Michele Mazzamuto, Antonino Furnari, Yoichi Sato, Giovanni Maria Farinella
PDF
GBC-Splat: Generalizable Gaussian-Based Clothed Human Digitalization Under Sparse RGB Cameras Hanzhang Tu, Zhanfeng Liao, Boyao Zhou, Shunyuan Zheng, Xilong Zhou, Liuxin Zhang, QianYing Wang, Yebin Liu
PDF
GBlobs: Explicit Local Structure via Gaussian Blobs for Improved Cross-Domain LiDAR-Based 3D Object Detection Dušan Malić, Christian Fruhwirth-Reisinger, Samuel Schulter, Horst Possegger
PDF
GCC: Generative Color Constancy via Diffusing a Color Checker Chen-Wei Chang, Cheng-De Fan, Chia-Che Chang, Yi-Chen Lo, Yu-Chee Tseng, Jiun-Long Huang, Yu-Lun Liu
PDF
GCE-Pose: Global Context Enhancement for Category-Level Object Pose Estimation Weihang Li, Hongli Xu, Junwen Huang, Hyunjun Jung, Peter KT Yu, Nassir Navab, Benjamin Busam
PDF
GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency Dongyue Lu, Lingdong Kong, Tianxin Huang, Gim Hee Lee
PDF
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control Mariam Hassan, Sebastian Stapf, Ahmad Rahimi, Pedro M B Rezende, Yasaman Haghighi, David Brüggemann, Isinsu Katircioglu, Lin Zhang, Xiaoran Chen, Suman Saha, Marco Cannici, Elie Aljalbout, Botao Ye, Xi Wang, Aram Davtyan, Mathieu Salzmann, Davide Scaramuzza, Marc Pollefeys, Paolo Favaro, Alexandre Alahi
PDF
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Xuanchi Ren, Tianchang Shen, Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas Müller, Alexander Keller, Sanja Fidler, Jun Gao
PDF
Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects Shalini Maiti, Lourdes Agapito, Filippos Kokkinos
PDF
GenAssets: Generating In-the-Wild 3D Assets in Latent Space Ze Yang, Jingkang Wang, Haowei Zhang, Sivabalan Manivasagam, Yun Chen, Raquel Urtasun
PDF
GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration Sudarshan Rajagopalan, Nithin Gopalakrishnan Nair, Jay N. Paranjape, Vishal M. Patel
PDF
Generalizable Object Keypoint Localization from Generative Priors Dongkai Wang, Jiang Duan, Liangjian Wen, Shiyu Xuan, Hao Chen, Shiliang Zhang
PDF
Generalized Diffusion Detector: Mining Robust Features from Diffusion Models for Domain-Generalized Detection Boyong He, Yuxiang Ji, Qianwen Ye, Zhuoyue Tan, Liaoni Wu
PDF
Generalized Few-Shot 3D Point Cloud Segmentation with Vision-Language Model Zhaochong An, Guolei Sun, Yun Liu, Runjia Li, Junlin Han, Ender Konukoglu, Serge Belongie
PDF
Generalized Gaussian Entropy Model for Point Cloud Attribute Compression with Dynamic Likelihood Intervals Changhao Peng
PDF
Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise Brayan Monroy, Jorge Bacca, Julián Tachella
PDF
Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation Libiao Chen, Dong Nie, Junjun Pan, Jing Yan, Zhenyu Tang
PDF
Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning Zhiyuan Yan, Yandan Zhao, Shen Chen, Mingyi Guo, Xinghe Fu, Taiping Yao, Shouhong Ding, Yunsheng Wu, Li Yuan
PDF
Generating 3D-Consistent Videos from Unposed Internet Photos Gene Chou, Kai Zhang, Sai Bi, Hao Tan, Zexiang Xu, Fujun Luan, Bharath Hariharan, Noah Snavely
PDF
Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision Tomoya Yoshida, Shuhei Kurita, Taichi Nishimura, Shinsuke Mori
PDF
Generating Multimodal Driving Scenes via Next-Scene Prediction Yanhao Wu, Haoyang Zhang, Tianwei Lin, Lichao Huang, Shujie Luo, Rui Wu, Congpei Qiu, Wei Ke, Tong Zhang
PDF
Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction Seungtae Nam, Xiangyu Sun, Gyeongjin Kang, Younggeun Lee, Seungjun Oh, Eunbyung Park
PDF
Generative Gaussian Splatting for Unbounded 3D City Generation Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu
PDF
Generative Hard Example Augmentation for Semantic Point Cloud Segmentation Qi Zhang, Jibin Peng, Zhao Huang, Wei Feng, Di Lin
PDF
Generative Image Layer Decomposition with Visual Effects Jinrui Yang, Qing Liu, Yijun Li, Soo Ye Kim, Daniil Pakhomov, Mengwei Ren, Jianming Zhang, Zhe Lin, Cihang Xie, Yuyin Zhou
PDF
Generative Inbetweening Through Frame-Wise Conditions-Driven Video Generation Tianyi Zhu, Dongwei Ren, Qilong Wang, Xiaohe Wu, Wangmeng Zuo
PDF
Generative mAP Priors for Collaborative BEV Semantic Segmentation Jiahui Fu, Yue Gong, Luting Wang, Shifeng Zhang, Xu Zhou, Si Liu
PDF
Generative Modeling of Class Probability for Multi-Modal Representation Learning JungKyoo Shin, Bumsoo Kim, Eunwoo Kim
PDF
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens Kaihang Pan, Wang Lin, Zhongqi Yue, Tenglong Ao, Liyu Jia, Wei Zhao, Juncheng Li, Siliang Tang, Hanwang Zhang
PDF
Generative Multiview Relighting for 3D Reconstruction Under Extreme Illumination Variation Hadi Alzayer, Philipp Henzler, Jonathan T. Barron, Jia-Bin Huang, Pratul P. Srinivasan, Dor Verbin
PDF
Generative Omnimatte: Learning to Decompose Video into Layers Yao-Chih Lee, Erika Lu, Sarah Rumbley, Michal Geyer, Jia-Bin Huang, Tali Dekel, Forrester Cole
PDF
Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis Yu Yuan, Xijun Wang, Yichen Sheng, Prateek Chennuri, Xingguang Zhang, Stanley Chan
PDF
Generative Photomontage Sean J. Liu, Nupur Kumari, Ariel Shamir, Jun-Yan Zhu
PDF
Generative Sparse-View Gaussian Splatting Hanyang Kong, Xingyi Yang, Xinchao Wang
PDF
Generative Video Propagation Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang, Qing Liu, Zhifei Zhang, Joon-Young Lee, Yijun Li, Bei Yu, Zhe Lin, Soo Ye Kim, Jiaya Jia
PDF
Generative Zero-Shot Composed Image Retrieval Lan Wang, Wei Ao, Vishnu Naresh Boddeti, Ser-Nam Lim
PDF
GenFusion: Closing the Loop Between Reconstruction and Generation via Videos Sibo Wu, Congrong Xu, Binbin Huang, Andreas Geiger, Anpei Chen
PDF
GENIUS: A Generative Framework for Universal Multimodal Search Sungyeon Kim, Xinliang Zhu, Xiaofan Lin, Muhammet Bastan, Douglas Gray, Suha Kwak
PDF
GENMANIP: LLM-Driven Simulation for Generalizable Instruction-Following Manipulation Ning Gao, Yilun Chen, Shuai Yang, Xinyi Chen, Yang Tian, Hao Li, Haifeng Huang, Hanqing Wang, Tai Wang, Jiangmiao Pang
PDF
GenPC: Zero-Shot Point Cloud Completion via 3D Generative Priors An Li, Zhe Zhu, Mingqiang Wei
PDF
GenVDM: Generating Vector Displacement Maps from a Single Image Yuezhi Yang, Qimin Chen, Vladimir G. Kim, Siddhartha Chaudhuri, Qixing Huang, Zhiqin Chen
PDF
GeoAvatar: Geometrically-Consistent Multi-Person Avatar Reconstruction from Sparse Multi-View Videos Soohyun Lee, Seoyeon Kim, HeeKyung Lee, Won-Sik Jeong, Joo Ho Lee
PDF
GeoDepth: From Point-to-Depth to Plane-to-Depth Modeling for Self-Supervised Monocular Depth Estimation Haifeng Wu, Shuhang Gu, Lixin Duan, Wen Li
PDF
Geometric Knowledge-Guided Localized Global Distribution Alignment for Federated Learning Yanbiao Ma, Wei Dai, Wenke Huang, Jiayi Chen
PDF
Geometry Field Splatting with Gaussian Surfels Kaiwen Jiang, Venkataram Sivaram, Cheng Peng, Ravi Ramamoorthi
PDF
Geometry in Style: 3D Stylization via Surface Normal Deformation Nam Anh Dinh, Itai Lang, Hyunwoo Kim, Oded Stein, Rana Hanocka
PDF
Geometry-Guided Online 3D Video Synthesis with Multi-View Temporal Consistency Hyunho Ha, Lei Xiao, Christian Richardt, Thu Nguyen-Phuoc, Changil Kim, Min H. Kim, Douglas Lanman, Numair Khan
PDF
GeoMM: On Geodesic Perspective for Multi-Modal Learning Shibin Mei, Hang Wang, Bingbing Ni
PDF
Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju, Sougata Sen, Sanjay E. Sarma, Archan Misra
PDF
GET: Unlocking the Multi-Modal Potential of CLIP for Generalized Category Discovery Enguang Wang, Zhimao Peng, Zhengyuan Xie, Fei Yang, Xialei Liu, Ming-Ming Cheng
PDF
GFlowVLM: Enhancing Multi-Step Reasoning in Vision-Language Models with Generative Flow Networks Haoqiang Kang, Enna Sachdeva, Piyush Gupta, Sangjae Bae, Kwonjoon Lee
PDF
GG-SSMs: Graph-Generating State Space Models Nikola Zubic, Davide Scaramuzza
PDF
GIF: Generative Inspiration for Face Recognition at Scale Saeed Ebrahimi, Sahar Rahimi, Ali Dabouei, Srinjoy Das, Jeremy M. Dawson, Nasser M. Nasrabadi
PDF
GIFStream: 4D Gaussian-Based Immersive Video with Feature Stream Hao Li, Sicheng Li, Xiang Gao, Abudouaihati Batuer, Lu Yu, Yiyi Liao
PDF
GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities Rao Fu, Dingxi Zhang, Alex Jiang, Wanjia Fu, Austin Funk, Daniel Ritchie, Srinath Sridhar
PDF
GIVEPose: Gradual Intra-Class Variation Elimination for RGB-Based Category-Level Object Pose Estimation Ziqin Huang, Gu Wang, Chenyangguang Zhang, Ruida Zhang, Xiu Li, Xiangyang Ji
PDF
GLane3D: Detecting Lanes with Graph of 3D Keypoints Halil İbrahim Öztürk, Muhammet Esat Kalfaoğlu, Ozsel Kilinc
PDF
GLASS: Guided Latent Slot Diffusion for Object-Centric Learning Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth
PDF
GliaNet: Adaptive Neural Network Structure Learning with Glia-Driven Mengqiao Han, Liyuan Pan, Xiabi Liu
PDF
Global-Local Tree Search in VLMs for 3D Indoor Scene Generation Wei Deng, Mengshi Qi, Huadong Ma
PDF
Glossy Object Reconstruction with Cost-Effective Polarized Acquisition Bojian Wu, Yifan Peng, Ruizhen Hu, Xiaowei Zhou
PDF
GLUS: Global-Local Reasoning Unified into a Single Large Language Model for Video Segmentation Lang Lin, Xueyang Yu, Ziqi Pang, Yu-Xiong Wang
PDF
GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing Tong Wang, Ting Liu, Xiaochao Qu, Chengjing Wu, Luoqi Liu, Xiaolin Hu
PDF
GO-N3RDet: Geometry Optimized NeRF-Enhanced 3D Object Detector Zechuan Li, Hongshan Yu, Yihao Ding, Jinhao Qiao, Basim Azam, Naveed Akhtar
PDF
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise Ryan Burgert, Yuancheng Xu, Wenqi Xian, Oliver Pilarski, Pascal Clausen, Mingming He, Li Ma, Yitong Deng, Lingxiao Li, Mohsen Mousavi, Michael Ryoo, Paul Debevec, Ning Yu
PDF
GOAL: Global-Local Object Alignment Learning Hyungyu Choi, Young Kyun Jang, Chanho Eom
PDF
GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, Wei Yin
PDF
Goku: Flow Based Video Generative Foundation Models Shoufa Chen, Chongjian Ge, Yuqi Zhang, Yida Zhang, Fengda Zhu, Hao Yang, Hongxiang Hao, Hui Wu, Zhichao Lai, Yifei Hu, Ting-Che Lin, Shilong Zhang, Fu Li, Chuan Li, Xing Wang, Yanghua Peng, Peize Sun, Ping Luo, Yi Jiang, Zehuan Yuan, Bingyue Peng, Xiaobing Liu
PDF
Golden Cudgel Network for Real-Time Semantic Segmentation Guoyu Yang, Yuan Wang, Daming Shi, Yanzhong Wang
PDF
GoLF-NRT: Integrating Global Context and Local Geometry for Few-Shot View Synthesis You Wang, Li Fang, Hao Zhu, Fei Hu, Long Ye, Zhan Ma
PDF
Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion Jona Ballé, Luca Versari, Emilien Dupont, Hyunjik Kim, Matthias Bauer
PDF
GPAvatar: High-Fidelity Head Avatars by Learning Efficient Gaussian Projections Wei-Qi Feng, Dong Han, Ze-Kang Zhou, Shunkai Li, Xiaoqiang Liu, Pengfei Wan, Di Zhang, Miao Wang
PDF
GPS as a Control Signal for Image Generation Chao Feng, Ziyang Chen, Aleksander Holynski, Alexei A. Efros, Andrew Owens
PDF
GPVK-VL: Geometry-Preserving Virtual Keyframes for Visual Localization Under Large Viewpoint Changes Yunxuan Li, Lei Fan, Xiaoying Xing, Jianxiong Zhou, Ying Wu
PDF
Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning Hasin Us Sami, Swapneel Sen, Amit K. Roy-Chowdhury, Srikanth V. Krishnamurthy, Basak Guler
PDF
Gradient-Guided Annealing for Domain Generalization Aristotelis Ballas, Christos Diou
PDF
GRAE-3DMOT: Geometry Relation-Aware Encoder for Online 3D Multi-Object Tracking Hyunseop Kim, Hyo-Jun Lee, Yonguk Lee, Jinu Lee, Hanul Kim, Yeong Jun Koh
PDF
Graph Neural Network Combining Event Stream and Periodic Aggregation for Low-Latency Event-Based Vision Manon Dampfhoffer, Thomas Mesquida, Damien Joubert, Thomas Dalgaty, Pascal Vivet, Christoph Posch
PDF
Graph-Embedded Structure-Aware Perceptual Hashing for Neural Network Protection and Piracy Detection Ruiheng Liu, Haozhe Chen, Boyao Zhao, Kejiang Chen, Weiming Zhang
PDF
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs Yi Fang, Bowen Jin, Jiacheng Shen, Sirui Ding, Qiaoyu Tan, Jiawei Han
PDF
GraphI2P: Image-to-Point Cloud Registration with Exploring Pattern of Correspondence via Graph Learning Lin Bie, Shouan Pan, Siqi Li, Yining Zhao, Yue Gao
PDF
GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning Guangyan Chen, Te Cui, Meiling Wang, Chengcai Yang, Mengxiao Hu, Haoyang Lu, Yao Mu, Zicai Peng, Tianxing Zhou, Xinran Jiang, Yi Yang, Yufeng Yue
PDF
GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding Yawen Shao, Wei Zhai, Yuhang Yang, Hongchen Luo, Yang Cao, Zheng-Jun Zha
PDF
Gromov-Wasserstein Problem with Cyclic Symmetry Shoichiro Takeda, Yasunori Akagi
PDF
GroomLight: Hybrid Inverse Rendering for Relightable Human Hair Appearance Modeling Yang Zheng, Menglei Chai, Delio Vicini, Yuxiao Zhou, Yinghao Xu, Leonidas Guibas, Gordon Wetzstein, Thabo Beeler
PDF
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels Yongshuo Zong, Qin Zhang, Dongsheng An, Zhihua Li, Xiang Xu, Linghan Xu, Zhuowen Tu, Yifan Xing, Onkar Dabeer
PDF
Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions He Zhu, Quyu Kong, Kechun Xu, Xunlong Xia, Bing Deng, Jieping Ye, Rong Xiong, Yue Wang
PDF
GroundingFace: Fine-Grained Face Understanding via Pixel Grounding Multimodal Large Language Model Yue Han, Jiangning Zhang, Junwei Zhu, Runze Hou, Xiaozhong Ji, Chuming Lin, Xiaobin Hu, Zhucun Xue, Yong Liu
PDF
GroupMamba: Efficient Group-Based Visual State Space Model Abdelrahman Shaker, Syed Talal Wasim, Salman Khan, Juergen Gall, Fahad Shahbaz Khan
PDF
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill Jieming Cui, Tengyu Liu, Ziyu Meng, Jiale Yu, Ran Song, Wei Zhang, Yixin Zhu, Siyuan Huang
PDF
GS-2DGS: Geometrically Supervised 2DGS for Reflective Object Reconstruction Jinguang Tong, Xuesong Li, Fahira Afzal Maken, Sundaram Muthu, Lars Petersson, Chuong Nguyen, Hongdong Li
PDF
GS-DiT: Advancing Video Generation with Dynamic 3D Gaussian Fields Through Efficient Dense 3D Point Tracking Weikang Bian, Zhaoyang Huang, Xiaoyu Shi, Yijin Li, Fu-Yun Wang, Hongsheng Li
PDF
GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting Zixuan Chen, Guangcong Wang, Jiahao Zhu, Jianhuang Lai, Xiaohua Xie
PDF
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration Yuchen Sun, Shanhui Zhao, Tao Yu, Hao Wen, Samith Va, Mengwei Xu, Yuanchun Li, Chongyang Zhang
PDF
Guiding Human-Object Interactions with Rich Geometry and Relations Mengqing Xue, Yifei Liu, Ling Guo, Shaoli Huang, Changxing Ding
PDF
Gyro-Based Neural Single Image Deblurring Heemin Yang, Jaesung Rim, Seungyong Lee, Seung-Hwan Baek, Sunghyun Cho
PDF
H-Edit: Effective and Flexible Diffusion-Based Editing via Doob's H-Transform Toan Nguyen, Kien Do, Duc Kieu, Thin Nguyen
PDF
H-MoRe: Learning Human-Centric Motion Representation for Action Analysis Zhanbo Huang, Xiaoming Liu, Yu Kong
PDF
H2ST: Hierarchical Two-Sample Tests for Continual Out-of-Distribution Detection Yuhang Liu, Wenjie Zhao, Yunhui Guo
PDF
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer Jiahao Cui, Hui Li, Yun Zhan, Hanlin Shang, Kaihui Cheng, Yuqi Ma, Shan Mu, Hang Zhou, Jingdong Wang, Siyu Zhu
PDF
HalLoc: Token-Level Localization of Hallucinations for Vision Language Models Eunkyu Park, Minyeong Kim, Gunhee Kim
PDF
Hand-Held Object Reconstruction from RGB Video with Dynamic Interaction Shijian Jiang, Qi Ye, Rengan Xie, Yuchi Huo, Jiming Chen
PDF
Handling Spatial-Temporal Data Heterogeneity for Federated Continual Learning via Tail Anchor Hao Yu, Xin Yang, Le Zhang, Hanlin Gu, Tianrui Li, Lixin Fan, Qiang Yang
PDF
HandOS: 3D Hand Reconstruction in One Stage Xingyu Chen, Zhuheng Song, Xiaoke Jiang, Yaoqing Hu, Junzhi Yu, Lei Zhang
PDF
Hardware-Rasterized Ray-Based Gaussian Splatting Samuel Rota Bulò, Nemanja Bartolovic, Lorenzo Porzi, Peter Kontschieder
PDF
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization Zitang Zhou, Ke Mei, Yu Lu, Tianyi Wang, Fengyun Rao
PDF
Harnessing Frequency Spectrum Insights for Image Copyright Protection Against Diffusion Models Zhenguang Liu, Chao Shuai, Shaojing Fan, Ziping Dong, Jinwu Hu, Zhongjie Ba, Kui Ren
PDF
Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment Mayug Maniparambil, Raiymbek Akshulakov, Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Ankit Singh, Noel E. O'Connor
PDF
Harnessing Global-Local Collaborative Adversarial Perturbation for Anti-Customization Long Xu, Jiakai Wang, Haojie Hao, Haotong Qin, Jiejie Zhao, Xianglong Liu
PDF
Hash3D: Training-Free Acceleration for 3D Generation Xingyi Yang, Songhua Liu, Xinchao Wang
PDF
HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos Jinglei Zhang, Jiankang Deng, Chao Ma, Rolandos Alexandros Potamias
PDF
Hazy Low-Quality Satellite Video Restoration via Learning Optimal Joint Degradation Patterns and Continuous-Scale Super-Resolution Reconstruction Ning Ni, Libao Zhang
PDF
HD-EPIC: A Highly-Detailed Egocentric Video Dataset Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha, Omar Emara, Sam Pollard, Kranti Kumar Parida, Kaiting Liu, Prajwal Gatti, Siddhant Bansal, Kevin Flanagan, Jacob Chalk, Zhifan Zhu, Rhodri Guerrier, Fahd Abdelazim, Bin Zhu, Davide Moltisanti, Michael Wray, Hazel Doughty, Dima Damen
PDF
Hearing Anywhere in Any Environment Xiulong Liu, Anurag Kumar, Paul Calamia, Sebastia V. Amengual, Calvin Murdock, Ishwarya Ananthabhotla, Philip Robinson, Eli Shlizerman, Vamsi Krishna Ithapu, Ruohan Gao
PDF
Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes Yiming Dou, Wonseok Oh, Yuqing Luo, Antonio Loquercio, Andrew Owens
PDF
HeatFormer: A Neural Optimizer for Multiview Human Mesh Recovery Yuto Matsubara, Ko Nishino
PDF
HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator Fan Yang, Ru Zhen, Jianing Wang, Yanhao Zhang, Haoxiang Chen, Haonan Lu, Sicheng Zhao, Guiguang Ding
PDF
HELVIPAD: A Real-World Dataset for Omnidirectional Stereo Depth Estimation Mehdi Zayene, Jannik Endres, Albias Havolli, Charles Corbière, Salim Cherkaoui, Alexandre Kontouli, Alexandre Alahi
PDF
HeMoRa: Unsupervised Heuristic Consensus Sampling for Robust Point Cloud Registration Shaocheng Yan, Yiming Wang, Kaiyan Zhao, Pengcheng Shi, Zhenjun Zhao, Yongjun Zhang, Jiayuan Li
PDF
HERA: Hybrid Explicit Representation for Ultra-Realistic Head Avatars Hongrui Cai, Yuting Xiao, Xuan Wang, Jiafei Li, Yudong Guo, Yanbo Fan, Shenghua Gao, Juyong Zhang
PDF
Heterogeneous Skeleton-Based Action Representation Learning Hongsong Wang, Xiaoyan Ma, Jidong Kuang, Jie Gui
PDF
Hiding Images in Diffusion Models by Editing Learned Score Functions Haoyu Chen, Yunqiao Yang, Nan Zhong, Kede Ma
PDF
Hierarchical Adaptive Filtering Network for Text Image Specular Highlight Removal Zhi Jiang, Jingbo Hu, Ling Zhang, Gang Fu, Chunxia Xiao
PDF
Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning Can Kucuksozen, Yucel Yemez
PDF
Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Meikang Qiu, Shuhan Qi, Shu-Tao Xia
PDF
Hierarchical Flow Diffusion for Efficient Frame Interpolation Yang Hai, Guo Wang, Tan Su, Wenjie Jiang, Yinlin Hu
PDF
Hierarchical Gaussian Mixture Model Splatting for Efficient and Part Controllable 3D Generation Qitong Yang, Mingtao Feng, Zijie Wu, Weisheng Dong, Fangfang Wu, Yaonan Wang, Ajmal Mian
PDF
Hierarchical Knowledge Prompt Tuning for Multi-Task Test-Time Adaptation Qiang Zhang, Mengsheng Zhao, Jiawei Liu, Fanrui Zhang, Yongchao Xu, Zheng-Jun Zha
PDF
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding Shehreen Azad, Vibhav Vineet, Yogesh Singh Rawat
PDF
HiFi-Portrait: Zero-Shot Identity-Preserved Portrait Generation with High-Fidelity Multi-Face Fusion Yifang Xu, Benxiang Zhai, Yunzhuo Sun, Ming Li, Yang Li, Sidan Du
PDF
High Dynamic Range Video Compression: A Large-Scale Benchmark Dataset and a Learned Bit-Depth Scalable Compression Algorithm Zhaoyi Tian, Feifeng Wang, Shiwei Wang, Zihao Zhou, Yao Zhu, Liquan Shen
PDF
High Temporal Consistency Through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight Cédric Vincent, Taehyoung Kim, Henri Meeß
PDF
High-Fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model Yiyang Shen, Kun Zhou, He Wang, Yin Yang, Tianjia Shao
PDF
High-Fidelity Lightweight Mesh Reconstruction from Point Clouds Chen Zhang, Wentao Wang, Ximeng Li, Xinyao Liao, Wanjuan Su, Wenbing Tao
PDF
High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model Mingtao Guo, Guanyu Xing, Yanli Liu
PDF
High-Quality Point Cloud Oriented Normal Estimation via Hybrid Angular and Euclidean Distance Encoding Yuanqi Li, Jingcheng Huang, Hongshen Wang, Peiyuan Lv, Yansong Liu, Jiuming Zheng, Jie Guo, Yanwen Guo
PDF
Higher-Order Ratio Cycles for Fast and Globally Optimal Shape Matching Paul Roetzer, Viktoria Ehm, Daniel Cremers, Zorah Lähner, Florian Bernard
PDF
HIIF: Hierarchical Encoding Based Implicit Image Function for Continuous Super-Resolution Yuxuan Jiang, Ho Man Kwan, Tianhao Peng, Ge Gao, Fan Zhang, Xiaoqing Zhu, Joel Sole, David Bull
PDF
HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving R.D. Lin, Pengcheng Weng, Yinqiao Wang, Han Ding, Jinsong Han, Fei Wang
PDF
HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation Yiming Liang, Tianhan Xu, Yuta Kikuchi
PDF
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation Hongwei Zheng, Han Li, Wenrui Dai, Ziyang Zheng, Chenglin Li, Junni Zou, Hongkai Xiong
PDF
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models Runhui Huang, Xinpeng Ding, Chunwei Wang, Jianhua Han, Yulong Liu, Hengshuang Zhao, Hang Xu, Lu Hou, Wei Zhang, Xiaodan Liang
PDF
HistoFS: Non-IID Histopathologic Whole Slide Image Classification via Federated Style Transfer with RoI-Preserving Farchan Hakim Raswa, Chun-Shien Lu, Jia-Ching Wang
PDF
HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation Hermann Kumbong, Xian Liu, Tsung-Yi Lin, Ming-Yu Liu, Xihui Liu, Ziwei Liu, Daniel Y. Fu, Christopher Re, David W. Romero
PDF
HoGS: Unified near and Far Object Reconstruction via Homogeneous Gaussian Splatting Xinpeng Liu, Zeyi Huang, Fumio Okura, Yasuyuki Matsushita
PDF
HOIGen-1m: A Large-Scale Dataset for Human-Object Interaction Video Generation Kun Liu, Qi Liu, Xinchen Liu, Jie Li, Yongdong Zhang, Jiebo Luo, Xiaodong He, Wu Liu
PDF
HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models Mingzhen Huang, Fu-Jen Chu, Bugra Tekin, Kevin J. Liang, Haoyu Ma, Weiyao Wang, Xingyu Chen, Pierre Gleize, Hongfei Xue, Siwei Lyu, Kris Kitani, Matt Feiszli, Hao Tang
PDF
Holmes-VAU: Towards Long-Term Video Anomaly Understanding at Any Granularity Huaxin Zhang, Xiaohao Xu, Xiang Wang, Jialong Zuo, Xiaonan Huang, Changxin Gao, Shanjun Zhang, Li Yu, Nong Sang
PDF
HomoGen: Enhanced Video Inpainting via Homography Propagation and Diffusion Ding Ding, Yueming Pan, Ruoyu Feng, Qi Dai, Kai Qiu, Jianmin Bao, Chong Luo, Zhenzhong Chen
PDF
Homogeneous Dynamics Space for Heterogeneous Humans Xinpeng Liu, Junxuan Liang, Chenshuo Zhang, Zixuan Cai, Cewu Lu, Yong-Lu Li
PDF
HOP: Heterogeneous Topology-Based Multimodal Entanglement for Co-Speech Gesture Generation Hongye Cheng, Tianyu Wang, Guangsi Shi, Zexing Zhao, Yanwei Fu
PDF
Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes Lihan Jiang, Kerui Ren, Mulin Yu, Linning Xu, Junting Dong, Tao Lu, Feng Zhao, Dahua Lin, Bo Dai
PDF
HORP: Human-Object Relation Priors Guided HOI Detection Pei Geng, Jian Yang, Shanshan Zhang
PDF
HOT: Hadamard-Based Optimized Training Seonggon Kim, Juncheol Shin, Seung-taek Woo, Eunhyeok Park
PDF
HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Shangchen Han, Fan Zhang, Linguang Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, Jakob Julian Engel, Tomas Hodan
PDF
HOTFormerLoc: Hierarchical Octree Transformer for Versatile LiDAR Place Recognition Across Ground and Aerial Views Ethan Griffiths, Maryam Haghighat, Simon Denman, Clinton Fookes, Milad Ramezani
PDF
HotSpot: Signed Distance Function Optimization with an Asymptotically Sufficient Condition Zimo Wang, Cheng Wang, Taiki Yoshino, Sirui Tao, Ziyang Fu, Tzu-Mao Li
PDF
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding Chenxin Tao, Shiqian Su, Xizhou Zhu, Chenyu Zhang, Zhe Chen, Jiawen Liu, Wenhai Wang, Lewei Lu, Gao Huang, Yu Qiao, Jifeng Dai
PDF
How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions Aditya Prakash, Benjamin Lundell, Dmitry Andreychuk, David Forsyth, Saurabh Gupta, Harpreet Sawhney
PDF
How to Merge Your Multimodal Models over Time? Sebastian Dziadzio, Vishaal Udandarao, Karsten Roth, Ameya Prabhu, Zeynep Akata, Samuel Albanie, Matthias Bethge
PDF
HRAvatar: High-Quality and Relightable Gaussian Head Avatar Dongbin Zhang, Yunfei Liu, Lijian Lin, Ye Zhu, Kangjie Chen, Minghan Qin, Yu Li, Haoqian Wang
PDF
HSI-GPT: A General-Purpose Large Scene-Motion-Language Model for Human Scene Interaction Yuan Wang, Yali Li, Xiang Li, Shengjin Wang
PDF
HSI: A Holistic Style Injector for Arbitrary Style Transfer Shuhao Zhang, Hui Kang, Yang Liu, Fang Mei, Hongjuan Li
PDF
Human Motion Instruction Tuning Lei Li, Sen Jia, Jianhao Wang, Zhongyu Jiang, Feng Zhou, Ju Dai, Tianfang Zhang, Zongkai Wu, Jenq-Neng Hwang
PDF
Human-Centered Interactive Learning via MLLMs for Text-to-Image Person Re-Identification Yang Qin, Chao Chen, Zhihang Fu, Dezhong Peng, Xi Peng, Peng Hu
PDF
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation Boyuan Wang, Xiaofeng Wang, Chaojun Ni, Guosheng Zhao, Zhiqin Yang, Zheng Zhu, Muyang Zhang, Yukun Zhou, Xinze Chen, Guan Huang, Lihong Liu, Xingang Wang
PDF
HumanMM: Global Human Motion Recovery from Multi-Shot Videos Yuhong Zhang, Guanlin Wu, Ling-Hao Chen, Zhuokai Zhao, Jing Lin, Xiaoke Jiang, Jiamin Wu, Zhuoheng Li, Hao Frank Yang, Haoqian Wang, Lei Zhang
PDF
HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset Zedong Chu, Feng Xiong, Meiduo Liu, Jinzhi Zhang, Mingqi Shao, Zhaoxu Sun, Di Wang, Mu Xu
PDF
HuMoCon: Concept Discovery for Human Motion Understanding Qihang Fang, Chengcheng Tang, Bugra Tekin, Shugao Ma, Yanchao Yang
PDF
HUNet: Homotopy Unfolding Network for Image Compressive Sensing Feiyang Shen, Hongping Gan
PDF
HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation Zunnan Xu, Zhentao Yu, Zixiang Zhou, Jun Zhou, Xiaoyu Jin, Fa-ting Hong, Xiaozhong Ji, Junwei Zhu, Chengfei Cai, Shiyu Tang, Qin Lin, Xiu Li, Qinglin Lu
PDF
HuPerFlow: A Comprehensive Benchmark for Human vs. Machine Motion Estimation Comparison Yung-Hao Yang, Zitang Sun, Taiki Fukiage, Shin'ya Nishida
PDF
HUSH: Holistic Panoramic 3D Scene Understanding Using Spherical Harmonics Jongsung Lee, Harin Park, Byeong-Uk Lee, Kyungdon Joo
PDF
HVI: A New Color Space for Low-Light Image Enhancement Qingsen Yan, Yixu Feng, Cheng Zhang, Guansong Pang, Kangbiao Shi, Peng Wu, Wei Dong, Jinqiu Sun, Yanning Zhang
PDF
Hybrid Concept Bottleneck Models Yang Liu, Tianwei Zhang, Shi Gu
PDF
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation Ting Liu, Siyuan Li
PDF
Hybrid Reciprocal Transformer with Triplet Feature Alignment for Scene Graph Generation Jiawei Fu, Tiantian Zhang, Kai Chen, Qi Dou
PDF
Hybrid-Level Instruction Injection for Video Token Compression in Multi-Modal Large Language Models Zhihang Liu, Chen-Wei Xie, Pandeng Li, Liming Zhao, Longxiang Tang, Yun Zheng, Chuanbin Liu, Hongtao Xie
PDF
HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting Jingyu Lin, Jiaqi Gu, Lubin Fan, Bojian Wu, Yujing Lou, Renjie Chen, Ligang Liu, Jieping Ye
PDF
HybridMQA: Exploring Geometry-Texture Interactions for Colored Mesh Quality Assessment Armin Shafiee Sarvestani, Sheyang Tang, Zhou Wang
PDF
Hyperbolic Category Discovery Yuanpei Liu, Zhenqi He, Kai Han
PDF
Hyperbolic Safety-Aware Vision-Language Models Tobia Poppi, Tejaswi Kasarla, Pascal Mettes, Lorenzo Baraldi, Rita Cucchiara
PDF
Hyperbolic Uncertainty-Aware Few-Shot Incremental Point Cloud Segmentation Tanuj Sur, Samrat Mukherjee, Kaizer Rahaman, Subhasis Chaudhuri, Muhammad Haris Khan, Biplab Banerjee
PDF
Hyperdimensional Uncertainty Quantification for Multimodal Uncertainty Fusion in Autonomous Vehicles Perception Luke Chen, Junyao Wang, Trier Mortlock, Pramod Khargonekar, Mohammad Abdullah Al Faruque
PDF
HyperFree: A Channel-Adaptive and Tuning-Free Foundation Model for Hyperspectral Remote Sensing Imagery Jingtao Li, Yingyi Liu, Xinyu Wang, Yunning Peng, Chen Sun, Shaoyu Wang, Zhendong Sun, Tian Ke, Xiao Jiang, Tangwei Lu, Anran Zhao, Yanfei Zhong
PDF
HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation Trong-Thuan Nguyen, Pha Nguyen, Jackson Cothren, Alper Yilmaz, Khoa Luu
PDF
Hypergraph Vision Transformers: Images Are More than Nodes, More than Edges Joshua Fixelle
PDF
HyperGS: Hyperspectral 3D Gaussian Splatting Christopher Thirgood, Oscar Mendez, Erin Ling, Jon Storey, Simon Hadfield
PDF
HyperLoRA: Parameter-Efficient Adaptive Generation for Portrait Synthesis Mengtian Li, Jinshu Chen, Wanquan Feng, Bingchuan Li, Fei Dai, Songtao Zhao, Qian He
PDF
HyperNet Fields: Efficiently Training Hypernetworks Without Ground Truth by Learning Weight Trajectories Eric Hedlin, Munawar Hayat, Fatih Porikli, Kwang Moo Yi, Shweta Mahajan
PDF
HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks Maria Pilligua, Danna Xue, Javier Vazquez-Corral
PDF
HyperPose: Hypernetwork-Infused Camera Pose Localization and an Extended Cambridge Landmarks Dataset Ron Ferens, Yosi Keller
PDF
HyperSeg: Hybrid Segmentation Assistant with Fine-Grained Visual Perceiver Cong Wei, Yujie Zhong, Haoxian Tan, Yong Liu, Jie Hu, Dengjie Li, Zheng Zhao, Yujiu Yang
PDF
Hyperspectral Pansharpening via Diffusion Models with Iteratively Zero-Shot Guidance Jin-Liang Xiao, Ting-Zhu Huang, Liang-Jian Deng, Guang Lin, Zihan Cao, Chao Li, Qibin Zhao
PDF
I2VGuard: Safeguarding Images Against Misuse in Diffusion-Based Image-to-Video Models Dongnan Gui, Xun Guo, Wengang Zhou, Yan Lu
PDF
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments Can Zhang, Gim Hee Lee
PDF
ICE: Intrinsic Concept Extraction from a Single Image via Diffusion Models Fernando Julio Cendra, Kai Han
PDF
IceDiff: High Resolution and High-Quality Arctic Sea Ice Forecasting with Generative Diffusion Prior Jingyi Xu, Siwei Tu, Weidong Yang, Ben Fei, Shuhao Li, Keyi Liu, Yeqi Luo, Lipeng Ma, Lei Bai
PDF
ICP: Immediate Compensation Pruning for Mid-to-High Sparsity Xin Luo, Xueming Fu, Zihang Jiang, S. Kevin Zhou
PDF
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models Junzhe Chen, Tianshu Zhang, Shiyu Huang, Yuwei Niu, Linfeng Zhang, Lijie Wen, Xuming Hu
PDF
ID-Patch: Robust ID Association for Group Photo Personalization Yimeng Zhang, Tiancheng Zhi, Jing Liu, Shen Sang, Liming Jiang, Qing Yan, Sijia Liu, Linjie Luo
PDF
IDEA-Bench: How Far Are Generative Models from Professional Designing? Chen Liang, Lianghua Huang, Jingwu Fang, Huanzhang Dou, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Junge Zhang, Xin Zhao, Yu Liu
PDF
IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-Modal Object Re-Identification Yuhao Wang, Yongfeng Lv, Pingping Zhang, Huchuan Lu
PDF
Identifying and Mitigating Position Bias of Multi-Image Vision-Language Models Xinyu Tian, Shu Zou, Zhaoyuan Yang, Jing Zhang
PDF
Identifying and Mitigating Spurious Correlation in Multi-Task Learning Junyi Chai, Shenyu Lu, Xiaoqian Wang
PDF
Identity-Clothing Similarity Modeling for Unsupervised Clothing Change Person Re-Identification Zhiqi Pang, Junjie Wang, Lingling Zhao, Chunyu Wang
PDF
Identity-Preserving Distillation Sampling by Fixed-Point Iterator SeonHwa Kim, Jiwon Kim, Soobin Park, Donghoon Ahn, Jiwon Kang, Seungryong Kim, Kyong Hwan Jin, Eunju Cha
PDF
Identity-Preserving Text-to-Video Generation by Frequency Decomposition Shenghai Yuan, Jinfa Huang, Xianyi He, Yunyang Ge, Yujun Shi, Liuhan Chen, Jiebo Luo, Li Yuan
PDF
IDOL: Instant Photorealistic 3D Human Creation from a Single Image Yiyu Zhuang, Jiaxi Lv, Hao Wen, Qing Shuai, Ailing Zeng, Hao Zhu, Shifeng Chen, Yujiu Yang, Xun Cao, Wei Liu
PDF
IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation Yiren Song, Pei Yang, Hai Ci, Mike Zheng Shou
PDF
iG-6DoF: Model-Free 6DoF Pose Estimation for Unseen Object via Iterative 3D Gaussian Splatting Tuo Cao, Fei Luo, Jiongming Qin, Yu Jiang, Yusen Wang, Chunxia Xiao
PDF
ILIAS: Instance-Level Image Retrieval at Scale Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko, Pavel Suma, Nikolaos-Antonios Ypsilantis, Nikos Efthymiadis, Zakaria Laskar, Jiri Matas, Ondrej Chum, Giorgos Tolias
PDF
Illumination Spectrum Estimation for Multispectral Images via Surface Reflectance Modeling and Spatial-Spectral Feature Generation Hyejin Oh, Woo-Shik Kim, Sangyoon Lee, YungKyung Park, Je-Won Kang
PDF
IM-Portrait: Learning 3D-Aware Video Diffusion for Photorealistic Talking Heads from Monocular VideosC Yuan Li, Ziqian Bai, Feitong Tan, Zhaopeng Cui, Sean Fanello, Yinda Zhang
PDF
IM-Zero: Instance-Level Motion Controllable Video Generation in a Zero-Shot Manner Yuyang Huang, Yabo Chen, Li Ding, Xiaopeng Zhang, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian
PDF
Image Generation Diversity Issues and How to Tame Them Mischa Dombrowski, Weitong Zhang, Sarah Cechnicka, Hadrien Reynaud, Bernhard Kainz
PDF
Image Is All You Need to Empower Large-Scale Diffusion Models for In-Domain Generation Pu Cao, Feng Zhou, Lu Yang, Tianrui Huang, Qing Song
PDF
Image over Text: Transforming Formula Recognition Evaluation with Character Detection Matching Bin Wang, Fan Wu, Linke Ouyang, Zhuangcheng Gu, Rui Zhang, Renqiu Xia, Botian Shi, Bo Zhang, Conghui He
PDF
Image Quality Assessment: From Human to Machine Preference Chunyi Li, Yuan Tian, Xiaoyue Ling, Zicheng Zhang, Haodong Duan, Haoning Wu, Ziheng Jia, Xiaohong Liu, Xiongkuo Min, Guo Lu, Weisi Lin, Guangtao Zhai
PDF
Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference Wenhao Shen, Mingliang Zhou, Yu Chen, Xuekai Wei, Yong Feng, Huayan Pu, Weijia Jia
PDF
Image Reconstruction from Readout-Multiplexed Single-Photon Detector Arrays Shashwath Bharadwaj, Ruangrawee Kitichotkul, Akshay Agarwal, Vivek K Goyal
PDF
Image Referenced Sketch Colorization Based on Animation Creation Workflow Dingkun Yan, Xinrui Wang, Zhuoru Li, Suguru Saito, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo
PDF
Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy You Li, Fan Ma, Yi Yang
PDF
ImagineFSL: Self-Supervised Pretraining Matters on Imagined Base Set for VLM-Based Few-Shot Learning Haoyuan Yang, Xiaoou Li, Jiaming Lv, Xianjun Cheng, Qilong Wang, Peihua Li
PDF
IMFine: 3D Inpainting via Geometry-Guided Multi-View Refinement Zhihao Shi, Dong Huo, Yuhongze Zhou, Yan Min, Juwei Lu, Xinxin Zuo
PDF
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models Qirui Jiao, Daoyuan Chen, Yilun Huang, Bolin Ding, Yaliang Li, Ying Shen
PDF
Immune: Improving Safety Against Jailbreaks in Multi-Modal LLMs via Inference-Time Alignment Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh, Tianrui Guan, Mengdi Wang, Ahmad Beirami, Furong Huang, Alvaro Velasquez, Dinesh Manocha, Amrit Singh Bedi
PDF
Implicit Bias Injection Attacks Against Text-to-Image Diffusion Models Huayang Huang, Xiangye Jin, Jiaxu Miao, Yu Wu
PDF
Implicit Correspondence Learning for Image-to-Point Cloud Registration Xinjun Li, Wenfei Yang, Jiacheng Deng, Zhixin Cheng, Xu Zhou, Tianzhu Zhang
PDF
Improve Representation for Imbalanced Regression Through Geometric Constraints Zijian Dong, Yilei Wu, Chongyao Chen, Yingtian Zou, Yichi Zhang, Juan Helen Zhou
PDF
Improved Monocular Depth Prediction Using Distance Transform over Pre-Semantic Contours with Self-Supervised Neural Networks Marwane Hariat, Antoine Manzanera, David Filliat
PDF
Improved Video VAE for Latent Video Diffusion Model Pingyu Wu, Kai Zhu, Yu Liu, Liming Zhao, Wei Zhai, Yang Cao, Zheng-Jun Zha
PDF
Improving Accuracy and Calibration via Differentiated Deep Mutual Learning Han Liu, Peng Cui, Bingning Wang, Weipeng Chen, Yupeng Zhang, Jun Zhu, Xiaolin Hu
PDF
Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement Yuchen Ren, Zhengyu Zhao, Chenhao Lin, Bo Yang, Lu Zhou, Zhe Liu, Chao Shen
PDF
Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction Teng Hu, Jiangning Zhang, Ran Yi, Jieyu Weng, Yabiao Wang, Xianfang Zeng, Zhucun Xue, Lizhuang Ma
PDF
Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing Bingliang Zhang, Wenda Chu, Julius Berner, Chenlin Meng, Anima Anandkumar, Yang Song
PDF
Improving Editability in Image Generation with Layer-Wise Memory Daneul Kim, Jaeah Lee, Jaesik Park
PDF
Improving Gaussian Splatting with Localized Points Management Haosen Yang, Chenhao Zhang, Wenqing Wang, Marco Volino, Adrian Hilton, Li Zhang, Xiatian Zhu
PDF
Improving Personalized Search with Regularized Low-Rank Parameter Updates Fiona Ryan, Josef Sivic, Fabian Caba Heilbron, Judy Hoffman, James M. Rehg, Bryan Russell
PDF
Improving Semi-Supervised Semantic Segmentation with Sliced-Wasserstein Feature Alignment and Uniformity Chen-Yi Lu, Kasra Derakhshandeh, Somali Chaterji
PDF
Improving Sound Source Localization with Joint Slot Attention on Image and Audio Inho Kim, Youngkil Song, Jicheol Park, Won Hwa Kim, Suha Kwak
PDF
Improving the Training of Data-Efficient GANs via Quality Aware Dynamic Discriminator Rejection Sampling Zhaoyu Zhang, Yang Hua, Guanxiong Sun, Hui Wang, Seán McLoone
PDF
Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation Fengfan Zhou, Bangjie Yin, Hefei Ling, Qianyu Zhou, Wenxuan Wang
PDF
Improving Transferable Targeted Attacks with Feature Tuning Mixup Kaisheng Liang, Xuelong Dai, Yanjie Li, Dong Wang, Bin Xiao
PDF
Improving Visual and Downstream Performance of Low-Light Enhancer with Vision Foundation Models Collaboration Yuxuan Gu, Haoxuan Wang, Pengyang Ling, Zhixiang Wei, Huaian Chen, Yi Jin, Enhong Chen
PDF
Imputation-Free and Alignment-Free: Incomplete Multi-View Clustering Driven by Consensus Semantic Learning Yuzhuo Dai, Jiaqi Jin, Zhibin Dong, Siwei Wang, Xinwang Liu, En Zhu, Xihong Yang, Xinbiao Gan, Yu Feng
PDF
ImViD: Immersive Volumetric Videos for Enhanced VR Engagement Zhengxian Yang, Shi Pan, Shengqi Wang, Haoxiang Wang, Li Lin, Guanjun Li, Zhengqi Wen, Borong Lin, Jianhua Tao, Tao Yu
PDF
IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera Jian Huang, Chengrui Dong, Xuanhua Chen, Peidong Liu
PDF
Incomplete Multi-Modal Brain Tumor Segmentation via Learnable Sorting State Space Model Zheyu Zhang, Yayuan Lu, Feipeng Ma, Yueyi Zhang, Huanjing Yue, Xiaoyan Sun
PDF
Incomplete Multi-View Multi-Label Learning via Disentangled Representation and Label Semantic Embedding Xu Yan, Jun Yin, Jie Wen
PDF
Incorporating Dense Knowledge Alignment into Unified Multimodal Representation Models Yuhao Cui, Xinxing Zu, Wenhua Zhang, Zhongzhou Zhao, Jinyang Gao
PDF
Incremental Object Keypoint Learning Mingfu Liang, Jiahuan Zhou, Xu Zou, Ying Wu
PDF
IndoorGS: Geometric Cues Guided Gaussian Splatting for Indoor Scene Reconstruction Cong Ruan, Yuesong Wang, Tao Guan, Bin Zhang, Lili Ju
PDF
Inference-Scale Complexity in ANN-SNN Conversion for High-Performance and Low-Power Applications Tong Bu, Maohua Li, Zhaofei Yu
PDF
Infighting in the Dark: Multi-Label Backdoor Attack in Federated Learning Ye Li, Yanchao Zhao, Chengcheng Zhu, Jiale Zhang
PDF
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis Jian Han, Jinlai Liu, Yi Jiang, Bin Yan, Yuqi Zhang, Zehuan Yuan, Bingyue Peng, Xiaobing Liu
PDF
INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations Yongming Zhu, Longhao Zhang, Zhengkun Rong, Tianshu Hu, Shuang Liang, Zhipeng Ge
PDF
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment Yunhong Lu, Qichao Wang, Hengyuan Cao, Xierui Wang, Xiaoyin Xu, Min Zhang
PDF
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Yuhao Dong, Zuyan Liu, Hai-Long Sun, Jingkang Yang, Winston Hu, Yongming Rao, Ziwei Liu
PDF
InsightEdit: Towards Better Instruction Following for Image Editing Yingjing Xu, Jie Kong, Jiazhi Wang, Xiao Pan, Bo Lin, Qiang Liu
PDF
Insightful Instance Features for 3D Instance Segmentation Wonseok Roh, Hwanhee Jung, Giljoo Nam, Dong In Lee, Hyeongcheol Park, Sang Ho Yoon, Jungseock Joo, Sangpil Kim
PDF
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-Modal Instruction Tuning Hanxun Yu, Wentong Li, Song Wang, Junbo Chen, Jianke Zhu
PDF
InsTaG: Learning Personalized 3D Talking Head from Few-Second Video Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Jun Zhou, Lin Gu
PDF
Instance-Wise Supervision-Level Optimization in Active Learning Shinnosuke Matsuo, Riku Togashi, Ryoma Bise, Seiichi Uchida, Masahiro Nomura
PDF
InstanceCap: Improving Text-to-Video Generation via Instance-Aware Structured Caption Tiehan Fan, Kepan Nan, Rui Xie, Penghao Zhou, Zhenheng Yang, Chaoyou Fu, Xiang Li, Jian Yang, Ying Tai
PDF
InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception Haijie Li, Yanmin Wu, Jiarui Meng, Qiankun Gao, Zhiyao Zhang, Ronggang Wang, Jian Zhang
PDF
Instant Adversarial Purification with Adversarial Consistency Distillation Chun Tong Lei, Hon Ming Yam, Zhongliang Guo, Yifei Qian, Chun Pong Lau
PDF
Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting Jinbo Yan, Rui Peng, Zhiyan Wang, Luyang Tang, Jiayu Yang, Jie Liang, Jiahao Wu, Ronggang Wang
PDF
Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects Amir Barda, Matheus Gadelha, Vladimir G. Kim, Noam Aigerman, Amit H. Bermano, Thibault Groueix
PDF
Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning Sherry X. Chen, Misha Sra, Pradeep Sen
PDF
Instruction-Based Image Manipulation by Watching How Things Move Mingdeng Cao, Xuaner Zhang, Yinqiang Zheng, Zhihao Xia
PDF
Integral Fast Fourier Color Constancy Wenjun Wei, Yanlin Qian, Huaian Chen, Junkang Dai, Yi Jin
PDF
InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation Sirui Xu, Dongting Li, Yucheng Zhang, Xiyan Xu, Qi Long, Ziyin Wang, Yunzhi Lu, Shuchang Dong, Hezi Jiang, Akshat Gupta, Yu-Xiong Wang, Liang-Yan Gui
PDF
InteractAnything: Zero-Shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing Jinlu Zhang, Yixin Chen, Zan Wang, Jie Yang, Yizhou Wang, Siyuan Huang
PDF
InteractionMap: Improving Online Vectorized HDMap Construction with Interaction Kuang Wu, Chuan Yang, Zhanbin Li
PDF
Interactive Medical Image Analysis with Concept-Based Similarity Reasoning Ta Duc Huy, Sen Kim Tran, Phan Nguyen, Nguyen Hoang Tran, Tran Bao Sam, Anton van den Hengel, Zhibin Liao, Johan W. Verjans, Minh-Son To, Vu Minh Hieu Phan
PDF
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline Junlong Cheng, Bin Fu, Jin Ye, Guoan Wang, Tianbin Li, Haoyu Wang, Ruoyu Li, He Yao, Junren Cheng, Jingwen Li, Yanzhou Su, Min Zhu, Junjun He
PDF
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models Sai Kumar Dwivedi, Dimitrije Antić, Shashank Tripathi, Omid Taheri, Cordelia Schmid, Michael J. Black, Dimitrios Tzionas
PDF
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models Rick Akkerman, Haiwen Feng, Michael J. Black, Dimitrios Tzionas, Victoria Fernández Abrevaya
PDF
Interleaved-Modal Chain-of-Thought Jun Gao, Yongqi Li, Ziqiang Cao, Wenjie Li
PDF
InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions Sirui Xu, Hung Yu Ling, Yu-Xiong Wang, Liang-Yan Gui
PDF
Interpretable Generative Models Through Post-Hoc Concept Bottlenecks Akshay Kulkarni, Ge Yan, Chung-En Sun, Tuomas Oikarinen, Tsui-Wei Weng
PDF
Interpretable Image Classification via Non-Parametric Part Prototype Learning Zhijie Zhu, Lei Fan, Maurice Pagnucco, Yang Song
PDF
Interpreting Object-Level Foundation Models via Visual Precision Search Ruoyu Chen, Siyuan Liang, Jingzhi Li, Shiming Liu, Maosen Li, Zhen Huang, Hua Zhang, Xiaochun Cao
PDF
Inversion Circle Interpolation: Diffusion-Based Image Augmentation for Data-Scarce Classification Yanghao Wang, Long Chen
PDF
Investigating the Role of Weight Decay in Enhancing Nonconvex SGD Tao Sun, Yuhao Huang, Li Shen, Kele Xu, Bao Wang
PDF
Invisible Backdoor Attack Against Self-Supervised Learning Hanrong Zhang, Zhenting Wang, Boheng Li, Fulin Lin, Tingxu Han, Mingyu Jin, Chenlu Zhan, Mengnan Du, Hongwei Wang, Shiqing Ma
PDF
IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing Chun Gu, Xiaofei Wei, Zixuan Zeng, Yuxuan Yao, Li Zhang
PDF
IRIS: Inverse Rendering of Indoor Scenes from Low Dynamic Range Images Chih-Hao Lin, Jia-Bin Huang, Zhengqin Li, Zhao Dong, Christian Richardt, Tuotuo Li, Michael Zollhöfer, Johannes Kopf, Shenlong Wang, Changil Kim
PDF
Is `Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models Through Egocentric Instruction Tuning Ji Hyeok Jung, Eun Tae Kim, Seoyeon Kim, Joo Ho Lee, Bumsoo Kim, Buru Chang
PDF
Is This Generated Person Existed in Real-World? Fine-Grained Detecting and Calibrating Abnormal Human-Body Zeqing Wang, Qingyang Ma, Wentao Wan, Haojie Li, Keze Wang, Yonghong Tian
PDF
Is Your World Simulator a Good Story Presenter? a Consecutive Events-Based Benchmark for Future Long Video Generation Yiping Wang, Xuehai He, Kuan Wang, Luyao Ma, Jianwei Yang, Shuohang Wang, Simon Shaolei Du, Yelong Shen
PDF
iSegMan: Interactive Segment-and-Manipulate 3D Gaussians Yian Zhao, Wanshi Xu, Ruochong Zheng, Pengchong Qiao, Chang Liu, Jie Chen
PDF
It's a (Blind) Match! Towards Vision-Language Correspondence Without Parallel Data Dominik Schnaus, Nikita Araslanov, Daniel Cremers
PDF
ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-on Ji Woo Hong, Tri Ton, Trung X. Pham, Gwanhyeong Koo, Sunjae Yoon, Chang D. Yoo
PDF
Iterative Predictor-Critic Code Decoding for Real-World Image Dehazing Jiayi Fu, Siyu Liu, Zikun Liu, Chun-Le Guo, Hyunhee Park, Ruiqi Wu, Guoqing Wang, Chongyi Li
PDF
IterIS: Iterative Inference-Solving Alignment for LoRA Merging Hongxu Chen, Zhen Wang, Runshi Li, Bowei Zhu, Long Chen
PDF
Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising Yongli Xiang, Ziming Hong, Lina Yao, Dadong Wang, Tongliang Liu
PDF
JamMa: Ultra-Lightweight Local Feature Matching with Joint Mamba Xiaoyong Lu, Songlin Du
PDF
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation Chengyue Wu, Xiaokang Chen, Zhiyu Wu, Yiyang Ma, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan, Ping Luo
PDF
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Yiyang Ma, Xingchao Liu, Xiaokang Chen, Wen Liu, Chengyue Wu, Zhiyu Wu, Zizheng Pan, Zhenda Xie, Haowei Zhang, Xingkai Yu, Liang Zhao, Yisong Wang, Jiaying Liu, Chong Ruan
PDF
JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration Yunlong Lin, Zixu Lin, Haoyu Chen, Panwang Pan, Chenxin Li, Sixiang Chen, Kairun Wen, Yeying Jin, Wenbo Li, Xinghao Ding
PDF
JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data Runjian Chen, Wenqi Shao, Bo Zhang, Shaoshuai Shi, Li Jiang, Ping Luo
PDF
Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video Hoang Chuong Nguyen, Wei Mao, Jose M. Alvarez, Miaomiao Liu
PDF
Joint Out-of-Distribution Filtering and Data Discovery Active Learning Sebastian Schmidt, Leonard Schenk, Leo Schwinn, Stephan Günnemann
PDF
Joint Scheduling of Causal Prompts and Tasks for Multi-Task Learning Chaoyang Li, Jianyang Qin, Jinhao Cui, Zeyu Liu, Ning Hu, Qing Liao
PDF
Joint Vision-Language Social Bias Removal for CLIP Haoyu Zhang, Yangyang Guo, Mohan Kankanhalli
PDF
JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems Yifan Wang, Jian Zhao, Zhaoxin Fan, Xin Zhang, Xuecheng Wu, Yudian Zhang, Lei Jin, Xinyue Li, Gang Wang, Mengxi Jia, Ping Hu, Zheng Zhu, Xuelong Li
PDF
Just Dance with Pi! a Poly-Modal Inductor for Weakly-Supervised Video Anomaly Detection Snehashis Majhi, Giacomo D'Amicantonio, Antitza Dantcheva, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, Egor Bondarev, Francois Bremond
PDF
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs Ziheng Ouyang, Zhen Li, Qibin Hou
PDF
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-Wise Human Preferences Zhikai Li, Xuewen Liu, Dongrong Joe Fu, Jianquan Li, Qingyi Gu, Kurt Keutzer, Zhen Dong
PDF
KAC: Kolmogorov-Arnold Classifier for Continual Learning Yusong Hu, Zichen Liang, Fei Yang, Qibin Hou, Xialei Liu, Ming-Ming Cheng
PDF
Keep the Balance: A Parameter-Efficient Symmetrical Framework for RGB+X Semantic Segmentation Jiaxin Cai, Jingze Su, Qi Li, Wenjie Yang, Shu Wang, Tiesong Zhao, Shengfeng He, Wenxi Liu
PDF
KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation Antoni Bigata, Michał Stypułkowski, Rodrigo Mira, Stella Bounareli, Konstantinos Vougioukas, Zoe Landgraf, Nikita Drobyshev, Maciej Zieba, Stavros Petridis, Maja Pantic
PDF
Keyframe-Guided Creative Video Inpainting Yuwei Guo, Ceyuan Yang, Anyi Rao, Chenlin Meng, Omer Bar-Tal, Shuangrui Ding, Maneesh Agrawala, Dahua Lin, Bo Dai
PDF
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation Jiantao Lin, Xin Yang, Meixi Chen, Yingjie Xu, Dongyu Yan, Leyi Wu, Xinli Xu, Lie Xu, Shunsi Zhang, Ying-Cong Chen
PDF
KMD: Koopman Multi-Modality Decomposition for Generalized Brain Tumor Segmentation Under Incomplete Modalities Tianyi Liu, Haochuan Jiang, Kaizhu Huang
PDF
Knowledge Bridger: Towards Training-Free Missing Modality Completion Guanzhou Ke, Shengfeng He, Xiaoli Wang, Bo Wang, Guoqing Chao, Yuanyang Zhang, Yi Xie, Hexing Su
PDF
Knowledge Memorization and Rumination for Pre-Trained Model-Based Class-Incremental Learning Zijian Gao, Wangwang Jia, Xingxing Zhang, Dulan Zhou, Kele Xu, Feng Dawei, Yong Dou, Xinjun Mao, Huaimin Wang
PDF
Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition Wen Yin, Yong Wang, Guiduo Duan, Dongyang Zhang, Xin Hu, Yuan-Fang Li, Tao He
PDF
Koala-36m: A Large-Scale Video Dataset Improving Consistency Between Fine-Grained Conditions and Video Content Qiuheng Wang, Yukai Shi, Jiarong Ou, Rui Chen, Ke Lin, Jiahao Wang, Boyuan Jiang, Haotian Yang, Mingwu Zheng, Xin Tao, Fei Yang, Pengfei Wan, Di Zhang
PDF
KVQ: Boosting Video Quality Assessment via Saliency-Guided Local Perception Yunpeng Qu, Kun Yuan, Qizhi Xie, Ming Sun, Chao Zhou, Jian Wang
PDF
L-SWAG: Layer-Sample Wise Activation with Gradients Information for Zero-Shot NAS on Vision Transformers Sofia Casarin, Sergio Escalera, Oswald Lanz
PDF
Label Shift Meets Online Learning: Ensuring Consistent Adaptation with Universal Dynamic Regret Yucong Dai, Shilin Gu, Ruidong Fan, Chao Xu, Chenping Hou
PDF
LAL: Enhancing 3D Human Motion Prediction with Latency-Aware Auxiliary Learning Xiaoning Sun, Dong Wei, Huaijiang Sun, Shengxiang Hu
PDF
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant Yikun Liu, Yajie Zhang, Jiayin Cai, Xiaolong Jiang, Yao Hu, Jiangchao Yao, Yanfeng Wang, Weidi Xie
PDF
Language Guided Concept Bottleneck Models for Interpretable Continual Learning Lu Yu, Haoyu Han, Zhe Tao, Hantao Yao, Changsheng Xu
PDF
Language-Assisted Debiasing and Smoothing for Foundation Model-Based Semi-Supervised Learning Na Zheng, Xuemeng Song, Xue Dong, Aashish Nikhil Ghosh, Liqiang Nie, Roger Zimmermann
PDF
Language-Guided Audio-Visual Learning for Long-Term Sports Assessment Huangbiao Xu, Xiao Ke, Huanqi Wu, Rui Xu, Yuezhou Li, Wenzhong Guo
PDF
Language-Guided Image Tokenization for Generation Kaiwen Zha, Lijun Yu, Alireza Fathi, David A. Ross, Cordelia Schmid, Dina Katabi, Xiuye Gu
PDF
Language-Guided Salient Object Ranking Fang Liu, Yuhao Liu, Ke Xu, Shuquan Ye, Gerhard Petrus Hancke, Rynson W. H. Lau
PDF
Large Self-Supervised Models Bridge the Gap in Domain Adaptive Object Detection Marc-Antoine Lavoie, Anas Mahmoud, Steven L. Waslander
PDF
Large-Scale Multi-View Tensor Clustering with Implicit Linear Kernels Jiyuan Liu, Xinwang Liu, Chuankun Li, Xinhang Wan, Hao Tan, Yi Zhang, Weixuan Liang, Qian Qu, Yu Feng, Renxiang Guan, Ke Liang
PDF
Large-Scale Text-to-Image Model with Inpainting Is a Zero-Shot Subject-Driven Image Generator Chaehun Shin, Jooyoung Choi, Heeseung Kim, Sungroh Yoon
PDF
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis Yousef Yeganeh, Azade Farshad, Ioannis Charisiadis, Marta Hasny, Martin Hartenberger, Björn Ommer, Nassir Navab, Ehsan Adeli
PDF
Latent Space Imaging Matheus Souza, Yidan Zheng, Kaizhang Kang, Yogeshwar Nath Mishra, Qiang Fu, Wolfgang Heidrich
PDF
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models Jinho Jeong, Sangmin Han, Jinwoo Kim, Seon Joo Kim
PDF
LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion. Muchen Li, Sammy Christen, Chengde Wan, Yujun Cai, Renjie Liao, Leonid Sigal, Shugao Ma
PDF
LaTexBlend: Scaling Multi-Concept Customized Generation with Latent Textual Blending Jian Jin, Zhenbo Yu, Yang Shen, Zhenyong Fu, Jian Yang
PDF
LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos Daniel Etaat, Dvij Kalaria, Nima Rahmanian, S. Shankar Sastry
PDF
LaVin-DiT: Large Vision Diffusion Transformer Zhaoqing Wang, Xiaobo Xia, Runnan Chen, Dongdong Yu, Changhu Wang, Mingming Gong, Tongliang Liu
PDF
Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers Haoran You, Connelly Barnes, Yuqian Zhou, Yan Kang, Zhenbang Du, Wei Zhou, Lingzhi Zhang, Yotam Nitzan, Xiaoyang Liu, Zhe Lin, Eli Shechtman, Sohrab Amirghodsi, Yingyan Celine Lin
PDF
Layered Image Vectorization via Semantic Simplification Zhenyu Wang, Jianxi Huang, Zhida Sun, Yuanhao Gong, Daniel Cohen-Or, Min Lu
PDF
Layered Motion Fusion: Lifting Motion Segmentation to 3D in Egocentric Videos Vadim Tschernezki, Diane Larlus, Iro Laina, Andrea Vedaldi
PDF
LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models Fan-Yun Sun, Weiyu Liu, Siyi Gu, Dylan Lim, Goutam Bhat, Federico Tombari, Manling Li, Nick Haber, Jiajun Wu
PDF
LC-Mamba: Local and Continuous Mamba with Shifted Windows for Frame Interpolation Min Wu Jeong, Chae Eun Rhee
PDF
LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians Jiamin Wu, Kenkun Liu, Han Gao, Xiaoke Jiang, Yuan Yao, Lei Zhang
PDF
Learnable Infinite Taylor Gaussian for Dynamic View Rendering Bingbing Hu, Yanyan Li, Rui Xie, Bo Xu, Haoye Dong, Junfeng Yao, Gim Hee Lee
PDF
Learned Binocular-Encoding Optics for RGBD Imaging Using Joint Stereo and Focus Cues Yuhui Liu, Liangxun Ou, Qiang Fu, Hadi Amata, Wolfgang Heidrich, Yifan Peng
PDF
Learned Image Compression with Dictionary-Based Entropy Model Jingbo Lu, Leheng Zhang, Xingyu Zhou, Mu Li, Wen Li, Shuhang Gu
PDF
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene Shengqiong Wu, Hao Fei, Jingkang Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, Tat-seng Chua
PDF
Learning Affine Correspondences by Integrating Geometric Constraints Pengju Sun, Banglei Guan, Zhenbao Yu, Yang Shang, Qifeng Yu, Daniel Barath
PDF
Learning Audio-Guided Video Representation with Gated Attention for Video-Text Retrieval Boseung Jeong, Jicheol Park, Sungyeon Kim, Suha Kwak
PDF
Learning Bijective Surface Parameterization for Inferring Signed Distance Functions from Sparse Point Clouds with Grid Deformation Takeshi Noda, Chao Chen, Junsheng Zhou, Weiqi Zhang, Yu-Shen Liu, Zhizhong Han
PDF
Learning Class Prototypes for Unified Sparse-Supervised 3D Object Detection Yun Zhu, Le Hui, Hang Yang, Jianjun Qian, Jin Xie, Jian Yang
PDF
Learning Compatible Multi-Prize Subnetworks for Asymmetric Retrieval Yushuai Sun, Zikun Zhou, Dongmei Jiang, Yaowei Wang, Jun Yu, Guangming Lu, Wenjie Pei
PDF
Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning Xiaohan Zou, Wenchao Ma, Shu Zhao
PDF
Learning Dynamic Collaborative Network for Semi-Supervised 3D Vessel Segmentation Jiao Xu, Xin Chen, Lihe Zhang
PDF
Learning Endogenous Attention for Incremental Object Detection Xiang Song, Yuhang He, Jingyuan Li, Qiang Wang, Yihong Gong
PDF
Learning Extremely High Density Crowds as Active Matters Feixiang He, Jiangbei Yue, Jialin Zhu, Armin Seyfried, Dan Casas, Julien Pettré, He Wang
PDF
Learning Flow Fields in Attention for Controllable Person Image Generation Zijian Zhou, Shikun Liu, Xiao Han, Haozhe Liu, Kam Woh Ng, Tian Xie, Yuren Cong, Hang Li, Mengmeng Xu, Juan-Manuel Perez-Rua, Aditya Patel, Tao Xiang, Miaojing Shi, Sen He
PDF
Learning from Neighbors: Category Extrapolation for Long-Tail Learning Shizhen Zhao, Xin Wen, Jiahui Liu, Chuofan Ma, Chunfeng Yuan, Xiaojuan Qi
PDF
Learning from Streaming Video with Orthogonal Gradients Tengda Han, Dilara Gokay, Joseph Heyward, Chuhan Zhang, Daniel Zoran, Viorica Patraucean, Joao Carreira, Dima Damen, Andrew Zisserman
PDF
Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes Keqi Chen, Vinkle Srivastav, Didier Mutter, Nicolas Padoy
PDF
Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing Ruiyi Wang, Yushuo Zheng, Zicheng Zhang, Chunyi Li, Shuaicheng Liu, Guangtao Zhai, Xiaohong Liu
PDF
Learning Heterogeneous Tissues with Mixture of Experts for Gigapixel Whole Slide Images Junxian Wu, Minheng Chen, Xinyi Ke, Tianwang Xun, Xiaoming Jiang, Hongyu Zhou, Lizhi Shao, Youyong Kong
PDF
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking You Wu, Xucheng Wang, Xiangyang Yang, Mengyuan Liu, Dan Zeng, Hengzhou Ye, Shuiwang Li
PDF
Learning on Model Weights Using Tree Experts Eliahu Horwitz, Bar Cavia, Jonathan Kahana, Yedid Hoshen
PDF
Learning Partonomic 3D Reconstruction from Image Collections Xiaoqian Ruan, Pei Yu, Dian Jia, Hyeonjeong Park, Peixi Xiong, Wei Tang
PDF
Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model Yuxiang Mao, Zhenfeng Fan, ZhiJie Zhang, Zhiheng Zhang, Shihong Xia
PDF
Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation Xingguang Zhang, Nicholas Chimitt, Xijun Wang, Yu Yuan, Stanley H. Chan
PDF
Learning Physics from Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems Alejandro Castañeda Garcia, Jan Warchocki, Jan van Gemert, Daan Brinks, Nergis Tomen
PDF
Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References Yitang Li, Mingxian Lin, Zhuo Lin, Yipeng Deng, Yue Cao, Li Yi
PDF
Learning Temporally Consistent Video Depth from Video Diffusion Priors Jiahao Shao, Yuanbo Yang, Hongyu Zhou, Youmin Zhang, Yujun Shen, Vitor Guizilini, Yue Wang, Matteo Poggi, Yiyi Liao
PDF
Learning Textual Prompts for Open-World Semi-Supervised Learning Yuxin Fan, Junbiao Cui, Jiye Liang
PDF
Learning to Detect Objects from Multi-Agent LiDAR Scans Without Manual Labels Qiming Xia, Wenkai Lin, Haoen Xiang, Xun Huang, Siheng Chen, Zhen Dong, Cheng Wang, Chenglu Wen
PDF
Learning to Filter Outlier Edges in Global SfM Nicole Damblon, Marc Pollefeys, Daniel Barath
PDF
Learning to Highlight Audio by Watching Movies Chao Huang, Ruohan Gao, J. M. F. Tsang, Jan Kurcius, Cagdas Bilen, Chenliang Xu, Anurag Kumar, Sanjeel Parekh
PDF
Learning to Normalize on the SPD Manifold Under Bures-Wasserstein Geometry Rui Wang, Shaocheng Jin, Ziheng Chen, Xiaoqing Luo, Xiao-Jun Wu
PDF
Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation Taeyoung Yun, Dinghuai Zhang, Jinkyoo Park, Ling Pan
PDF
Learning Visual Composition Through Improved Semantic Guidance Austin Stone, Hagen Soltau, Robert Geirhos, Xi Yi, Ye Xia, Bingyi Cao, Kaifeng Chen, Abhijit Ogale, Jonathon Shlens
PDF
Learning Visual Generative Priors Without Text Shuailei Ma, Kecheng Zheng, Ying Wei, Wei Wu, Fan Lu, Yifei Zhang, Chen-Wei Xie, Biao Gong, Jiapeng Zhu, Yujun Shen
PDF
Learning with Noisy Triplet Correspondence for Composed Image Retrieval Shuxian Li, Changhao He, Xiting Liu, Joey Tianyi Zhou, Xi Peng, Peng Hu
PDF
Learning-Enabled Polynomial Lyapunov Function Synthesis via High-Accuracy Counterexample-Guided Framework Hanrui Zhao, Niuniu Qi, Mengxin Ren, Banglong Liu, Shuming Shi, Zhengfeng Yang
PDF
LEDiff: Latent Exposure Diffusion for HDR Generation Chao Wang, Zhihao Xia, Thomas Leimkuhler, Karol Myszkowski, Xuaner Zhang
PDF
LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging Maximilian Rokuss, Yannick Kirchhoff, Seval Akbal, Balint Kovacs, Saikat Roy, Constantin Ulrich, Tassilo Wald, Lukas T. Rotkopf, Heinz-Peter Schlemmer, Klaus Maier-Hein
PDF
Less Attention Is More: Prompt Transformer for Generalized Category Discovery Wei Zhang, Baopeng Zhang, Zhu Teng, Wenxin Luo, Junnan Zou, Jianping Fan
PDF
Less Is More: Efficient Image Vectorization with Adaptive Parameterization Kaibo Zhao, Liang Bao, Yufei Li, Xu Su, Ke Zhang, Xiaotian Qiao
PDF
Less Is More: Efficient Model Merging with Binary Task Switch Biqing Qi, Fangyuan Li, Zhen Wang, Junqi Gao, Dong Li, Peng Ye, Bowen Zhou
PDF
Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition Zheda Mai, Ping Zhang, Cheng-Hao Tu, Hong-You Chen, Quang-Huy Nguyen, Li Zhang, Wei-Lun Chao
PDF
Let Humanoids Hike! Integrative Skill Development on Complex Trails Kwan-Yee Lin, Stella X. Yu
PDF
Let Samples Speak: Mitigating Spurious Correlation by Exploiting the Clusterness of Samples Weiwei Li, Junzhuo Liu, Yuanyuan Ren, Yuchen Zheng, Yahao Liu, Wen Li
PDF
Let's Chorus: Partner-Aware Hybrid Song-Driven 3D Head Animation Xiumei Xie, Zikai Huang, Wenhao Xu, Peng Xiao, Xuemiao Xu, Huaidong Zhang
PDF
Let's Verify and Reinforce Image Generation Step by Step Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao, Ziyu Guo, Haoquan Zhang, Manyuan Zhang, Jiaming Liu, Peng Gao, Hongsheng Li
PDF
Leveraging 3D Geometric Priors in 2D Rotation Symmetry Detection Ahyun Seo, Minsu Cho
PDF
Leveraging Global Stereo Consistency for Category-Level Shape and 6d Pose Estimation from Stereo Images Junning Qiu, Minglei Lu, Fei Wang, Yu Guo, Yonggen Ling
PDF
Leveraging Perturbation Robustness to Enhance Out-of-Distribution Detection Wenxi Chen, Raymond A. Yeh, Shaoshuai Mou, Yan Gu
PDF
Leveraging SD mAP to Augment HD mAP-Based Trajectory Prediction Zhiwei Dong, Ran Ding, Wei Li, Peng Zhang, Guobin Tang, Jia Guo
PDF
Leveraging Temporal Cues for Semi-Supervised Multi-View 3D Object Detection Jinhyung Park, Navyata Sanghvi, Hiroki Adachi, Yoshihisa Shibata, Shawn Hunt, Shinya Tanaka, Hironobu Fujiyoshi, Kris Kitani
PDF
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis Hanlin Wang, Hao Ouyang, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Qifeng Chen, Yujun Shen, Limin Wang
PDF
Libra-Merging: Importance-Redundancy and Pruning-Merging Trade-Off for Acceleration Plug-in in Large Vision-Language Model Longrong Yang, Dong Shen, Chaoxiang Cai, Kaibing Chen, Fan Yang, Tingting Gao, Di Zhang, Xi Li
PDF
LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions Faridoun Mehri, Mahdieh Soleymani Baghshah, Mohammad Taher Pilehvar
PDF
LiDAR-RT: Gaussian-Based Ray Tracing for Dynamic LiDAR Re-Simulation Chenxu Zhou, Lvchang Fu, Sida Peng, Yunzhi Yan, Zhanhua Zhang, Yong Chen, Jiazhi Xia, Xiaowei Zhou
PDF
LidarGait++: Learning Local Features and Size Awareness from LiDAR Point Clouds for 3D Gait Recognition Chuanfu Shen, Rui Wang, Lixin Duan, Shiqi Yu
PDF
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts Qizhou Chen, Chengyu Wang, Dakan Wang, Taolin Zhang, Wangyue Li, Xiaofeng He
PDF
Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation Yueru Jia, Jiaming Liu, Sixiang Chen, Chenyang Gu, Zhilve Wang, Longzan Luo, Xiaoqi Li, Pengwei Wang, Zhongyuan Wang, Renrui Zhang, Shanghang Zhang
PDF
Lifting Motion to the 3D World via 2D Diffusion Jiaman Li, C. Karen Liu, Jiajun Wu
PDF
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference Hao Yin, Guangzong Si, Zilei Wang
PDF
Light Transport-Aware Diffusion Posterior Sampling for Single-View Reconstruction of 3D Volumes Ludwic Leonard, Nils Thurey, Rüdiger Westermann
PDF
Light3R-SfM: Towards Feed-Forward Structure-from-Motion Sven Elflein, Qunjie Zhou, Laura Leal-Taixé
PDF
LightLoc: Learning Outdoor LiDAR Localization at Light Speed Wen Li, Chen Liu, Shangshu Yu, Dunqiang Liu, Yin Zhou, Siqi Shen, Chenglu Wen, Cheng Wang
PDF
LIM: Large Interpolator Model for Dynamic Reconstruction Remy Sabathier, Niloy J. Mitra, David Novotny
PDF
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes Xiang Xu, Lingdong Kong, Hui Shuai, Liang Pan, Ziwei Liu, Qingshan Liu
PDF
Linear Attention Modeling for Learned Image Compression Donghui Feng, Zhengxue Cheng, Shen Wang, Ronghua Wu, Hongwei Hu, Guo Lu, Li Song
PDF
LineArt: A Knowledge-Guided Training-Free High-Quality Appearance Transfer for Design Drawing with Diffusion Model Xi Wang, Hongzhen Li, Heng Fang, Yichen Peng, Haoran Xie, Xi Yang, Chuntao Li
PDF
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu, Ji Hou, Tao Xu, Jialiang Wang, Felix Juefei-Xu, Yaqiao Luo, Peizhao Zhang, Tingbo Hou, Peter Vajda, Niraj K. Jha, Xiaoliang Dai
PDF
Linguistics-Aware Masked Image Modeling for Self-Supervised Scene Text Recognition Yifei Zhang, Chang Liu, Jin Wei, Xiaomeng Yang, Yu Zhou, Can Ma, Xiangyang Ji
PDF
Link to the past: Temporal Propagation for Fast 3D Human Reconstruction from Monocular Video Matthew Marchellus, Nadhira Noor, In Kyu Park
PDF
Link-Based Contrastive Learning for One-Shot Unsupervised Domain Adaptation Yue Zhang, Mingyue Bin, Yuyang Zhang, Zhongyuan Wang, Zhen Han, Chao Liang
PDF
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant Wei Li, Bing Hu, Rui Shao, Leyang Shen, Liqiang Nie
PDF
LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-Dependent Radiance Fields Zhengqin Li, Dilin Wang, Ka Chen, Zhaoyang Lv, Thu Nguyen-Phuoc, Milim Lee, Jia-Bin Huang, Lei Xiao, Yufeng Zhu, Carl S. Marshall, Yuheng Ren, Richard Newcombe, Zhao Dong
PDF
LiSu: A Dataset and Method for LiDAR Surface Normal Estimation Dušan Malić, Christian Fruhwirth-Reisinger, Samuel Schulter, Horst Possegger
PDF
LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors Han Zhou, Wei Dong, Jun Chen
PDF
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale Joya Chen, Ziyun Zeng, Yiqi Lin, Wei Li, Zejun Ma, Mike Zheng Shou
PDF
LiVOS: Light Video Object Segmentation with Gated Linear Matching Qin Liu, Jianfeng Wang, Zhengyuan Yang, Linjie Li, Kevin Lin, Marc Niethammer, Lijuan Wang
PDF
LLaVA-Critic: Learning to Evaluate Multimodal Models Tianyi Xiong, Xiyao Wang, Dong Guo, Qinghao Ye, Haoqi Fan, Quanquan Gu, Heng Huang, Chunyuan Li
PDF
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding Hongyu Li, Jinyu Chen, Ziyu Wei, Shaofei Huang, Tianrui Hui, Jialin Gao, Xiaoming Wei, Si Liu
PDF
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living Dominick Reilly, Rajatsubhra Chakraborty, Arkaprava Sinha, Manish Kumar Govind, Pu Wang, Francois Bremond, Le Xue, Srijan Das
PDF
LLM-Driven Multimodal and Multi-Identity Listening Head Generation Peiwen Lai, Weizhi Zhong, Yipeng Qin, Xiaohang Ren, Baoyuan Wang, Guanbin Li
PDF
LLMDet: Learning Strong Open-Vocabulary Object Detectors Under the Supervision of Large Language Models Shenghao Fu, Qize Yang, Qijie Mo, Junkai Yan, Xihan Wei, Jingke Meng, Xiaohua Xie, Wei-Shi Zheng
PDF
LMO: Linear Mamba Operator for MRI Reconstruction Wei Li, Jiawei Jiang, Jie Wu, Kaihao Yu, Jianwei Zheng
PDF
Locality-Aware Zero-Shot Human-Object Interaction Detection Sanghyun Kim, Deunsol Jung, Minsu Cho
PDF
Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation Byung Hyun Lee, Sungjin Lim, Se Young Chun
PDF
Localizing Events in Videos with Multimodal Queries Gengyuan Zhang, Mang Ling Ada Fok, Jialu Ma, Yan Xia, Daniel Cremers, Philip Torr, Volker Tresp, Jindong Gu
PDF
Locally Orderless Images for Optimization in Differentiable Rendering Ishit Mehta, Manmohan Chandraker, Ravi Ramamoorthi
PDF
LOCORE: Image Re-Ranking with Long-Context Sequence Modeling Zilin Xiao, Pavel Suma, Ayush Sachdeva, Hao-Jen Wang, Giorgos Kordopatis-Zilos, Giorgos Tolias, Vicente Ordonez
PDF
LOD-GS: Achieving Levels of Detail Using Scalable Gaussian Soup Jianxiong Shen, Yue Qian, Xiaohang Zhan
PDF
LOGICZSL: Exploring Logic-Induced Representation for Compositional Zero-Shot Learning Peng Wu, Xiankai Lu, Hao Hu, Yongqin Xian, Jianbing Shen, Wenguan Wang
PDF
Logits DeConfusion with CLIP for Few-Shot Learning Shuo Li, Fang Liu, Zehua Hao, Xinyi Wang, Lingling Li, Xu Liu, Puhua Chen, Wenping Ma
PDF
LogoSP: Local-Global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds Zihui Zhang, Weisheng Dai, Hongtao Wen, Bo Yang
PDF
LoKi: Low-Dimensional KAN for Efficient Fine-Tuning Image Models Xuan Cai, Renjie Pan, Hua Yang
PDF
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation Xin Yan, Yuxuan Cai, Qiuyue Wang, Yuan Zhou, Wenhao Huang, Huan Yang
PDF
LongDiff: Training-Free Long Video Generation in One Go Zhuoling Li, Hossein Rahmani, Qiuhong Ke, Jun Liu
PDF
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos Tiantian Geng, Jinrui Zhang, Qingni Wang, Teng Wang, Jinming Duan, Feng Zheng
PDF
LookCloser: Frequency-Aware Radiance Field for Tiny-Detail Scene Xiaoyu Zhang, Weihong Pan, Chong Bao, Xiyu Zhang, Xiaojun Xiang, Hanqing Jiang, Hujun Bao
PDF
LookingGlass: Generative Anamorphoses via Laplacian Pyramid Warping Pascal Chang, Sergio Sancho, Jingwei Tang, Markus Gross, Vinicius Azevedo
PDF
LoRA Recycle: Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs Zixuan Hu, Yongxian Wei, Li Shen, Chun Yuan, Dacheng Tao
PDF
LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning Xuan Liu, Xiaobin Chang
PDF
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models Enis Simsar, Thomas Hofmann, Federico Tombari, Pinar Yanardag
PDF
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models Jian Liang, Wenke Huang, Guancheng Wan, Qu Yang, Mang Ye
PDF
Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues Youngjoon Jang, Haran Raajesh, Liliane Momeni, Gül Varol, Andrew Zisserman
PDF
LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty Christoforos N. Spartalis, Theodoros Semertzidis, Efstratios Gavves, Petros Daras
PDF
LotusFilter: Fast Diverse Nearest Neighbor Search via a Learned Cutoff Table Yusuke Matsui
PDF
Low-Biased General Annotated Dataset Generation Dengyang Jiang, Haoyu Wang, Lei Zhang, Wei Wei, Guang Dai, Mengmeng Wang, Jingdong Wang, Yanning Zhang
PDF
Low-Rank Adaptation in Multilinear Operator Networks for Security-Preserving Incremental Learning Huu Binh Ta, Duc Nguyen, Quyen Tran, Toan Tran, Tung Pham
PDF
LP-Diff: Towards Improved Restoration of Real-World Degraded License Plate Haoyan Gong, Zhenrong Zhang, Yuzheng Feng, Anh Nguyen, Hongbin Liu
PDF
LPOSS: Label Propagation over Patches and Pixels for Open-Vocabulary Semantic Segmentation Vladan Stojnić, Yannis Kalantidis, Jiří Matas, Giorgos Tolias
PDF
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences Hongyan Zhi, Peihao Chen, Junyan Li, Shuailei Ma, Xinyu Sun, Tianhang Xiang, Yinjie Lei, Mingkui Tan, Chuang Gan
PDF
LSNet: See Large, Focus Small Ao Wang, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding
PDF
LT3SD: Latent Trees for 3D Scene Diffusion Quan Meng, Lei Li, Matthias Nießner, Angela Dai
PDF
LUCAS: Layered Universal Codec Avatars Di Liu, Teng Deng, Giljoo Nam, Yu Rong, Stanislav Pidhorskyi, Junxuan Li, Jason Saragih, Dimitris N. Metaxas, Chen Cao
PDF
Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment Ziteng Cui, Xuangeng Chu, Tatsuya Harada
PDF
LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting Xiaoyan Xing, Konrad Groh, Sezer Karaoglu, Theo Gevers, Anand Bhattad
PDF
Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset Yiqun Mei, Mingming He, Li Ma, Julien Philip, Wenqi Xian, David M George, Xueming Yu, Gabriel Dedic, Ahmet Levent Taşel, Ning Yu, Vishal M. Patel, Paul Debevec
PDF
M-LLM Based Video Frame Selection for Efficient Video Understanding Kai Hu, Feng Gao, Xiaohan Nie, Peng Zhou, Son Tran, Tal Neiman, Lingyun Wang, Mubarak Shah, Raffay Hamid, Bing Yin, Trishul Chilimbi
PDF
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation Zixuan Chen, Jiaxin Li, Junxuan Liang, Liming Tan, Yejie Guo, Cewu Lu, Yong-Lu Li
PDF
M3amba: Memory Mamba Is All You Need for Whole Slide Image Classification Tingting Zheng, Kui Jiang, Yi Xiao, Sicheng Zhao, Hongxun Yao
PDF
M3GYM: A Large-Scale Multimodal Multi-View Multi-Person Pose Dataset for Fitness Activity Understanding in Real-World Settings Qingzheng Xu, Ru Cao, Xin Shen, Heming Du, Sen Wang, Xin Yu
PDF
MAC-Ego3D: Multi-Agent Gaussian Consensus for Real-Time Collaborative Ego-Motion and Photorealistic 3D Reconstruction Xiaohao Xu, Feng Xue, Shibo Zhao, Yike Pan, Sebastian Scherer, Xiaonan Huang
PDF
MAD: Memory-Augmented Detection of 3D Objects Ben Agro, Sergio Casas, Patrick Wang, Thomas Gilles, Raquel Urtasun
PDF
MaDCoW: Marginal Distortion Correction for Wide-Angle Photography with Arbitrary Objects Kevin Zhang, Jia-Bin Huang, Jose Echevarria, Stephen DiVerdi, Aaron Hertzmann
PDF
MAGE : Single Image to Material-Aware 3D via the Multi-View G-Buffer Estimation Model Haoyuan Wang, Zhenwei Wang, Xiaoxiao Long, Cheng Lin, Gerhard Hancke, Rynson W.H. Lau
PDF
MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM Vladimir Yugay, Theo Gevers, Martin R. Oswald
PDF
MagicArticulate: Make Your 3D Models Articulation-Ready Chaoyue Song, Jianfeng Zhang, Xiu Li, Fan Yang, Yiwen Chen, Zhongcong Xu, Jun Hao Liew, Xiaoyang Guo, Fayao Liu, Jiashi Feng, Guosheng Lin
PDF
MagicQuill: An Intelligent Interactive Image Editing System Zichen Liu, Yue Yu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Wen Wang, Zhiheng Liu, Qifeng Chen, Yujun Shen
PDF
Magma: A Foundation Model for Multimodal AI Agents Jianwei Yang, Reuben Tan, Qianhui Wu, Ruijie Zheng, Baolin Peng, Yongyuan Liang, Yu Gu, Mu Cai, Seonghyeon Ye, Joel Jang, Yuquan Deng, Jianfeng Gao
PDF
Maintaining Consistent Inter-Class Topology in Continual Test-Time Adaptation Chenggong Ni, Fan Lyu, Jiayao Tan, Fuyuan Hu, Rui Yao, Tao Zhou
PDF
MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration Boyun Li, Haiyu Zhao, Wenxin Wang, Peng Hu, Yuanbiao Gou, Xi Peng
PDF
Make It Count: Text-to-Image Generation with an Accurate Number of Objects Lital Binyamin, Yoad Tewel, Hilit Segev, Eran Hirsch, Royi Rassin, Gal Chechik
PDF
Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters Zhiyang Guo, Jinxu Xiang, Kai Ma, Wengang Zhou, Houqiang Li, Ran Zhang
PDF
Making Old Film Great Again: Degradation-Aware State Space Model for Old Film Restoration Yudong Mao, Hao Luo, Zhiwei Zhong, Peilin Chen, Zhijiang Zhang, Shiqi Wang
PDF
Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation Xin Zhang, Robby T. Tan
PDF
Mamba-Adaptor: State Space Model Adaptor for Visual Recognition Fei Xie, Jiahao Nie, Yujin Tang, Wenkang Zhang, Hongshen Zhao
PDF
Mamba-Reg: Vision Mamba Also Needs Registers Feng Wang, Jiahao Wang, Sucheng Ren, Guoyizhe Wei, Jieru Mei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie
PDF
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models Jiuming Liu, Jinru Han, Lihao Liu, Angelica I. Aviles-Rivero, Chaokang Jiang, Zhe Liu, Hesheng Wang
PDF
MambaIC: State Space Models for High-Performance Learned Image Compression Fanhu Zeng, Hao Tang, Yihua Shao, Siyu Chen, Ling Shao, Yan Wang
PDF
MambaIRv2: Attentive State Space Restoration Hang Guo, Yong Guo, Yaohua Zha, Yulun Zhang, Wenbo Li, Tao Dai, Shu-Tao Xia, Yawei Li
PDF
MambaOut: Do We Really Need Mamba for Vision? Weihao Yu, Xinchao Wang
PDF
MambaVision: A Hybrid Mamba-Transformer Vision Backbone Ali Hatamizadeh, Jan Kautz
PDF
MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking Xinqi Liu, Li Zhou, Zikun Zhou, Jianqiu Chen, Zhenyu He
PDF
MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing Shuo Wang, Wanting Li, Yongcai Wang, Zhaoxin Fan, Zhe Huang, Xudong Cai, Jian Zhao, Deying Li
PDF
MammAlps: A Multi-View Video Behavior Monitoring Dataset of Wild Mammals in the Swiss Alps Valentin Gabeff, Haozhe Qi, Brendan Flaherty, Gencer Sumbul, Alexander Mathis, Devis Tuia
PDF
MangaNinja: Line Art Colorization with Precise Reference Following Zhiheng Liu, Ka Leong Cheng, Xi Chen, Jie Xiao, Hao Ouyang, Kai Zhu, Yu Liu, Yujun Shen, Qifeng Chen, Ping Luo
PDF
Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang, Qi Zhang, Wenbo Hu, Chaopeng Zhang, Yao Yao, Ying Shan, Long Quan
PDF
ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning Kailin Li, Puhao Li, Tengyu Liu, Yuyang Li, Siyuan Huang
PDF
ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping Youxin Pang, Ruizhi Shao, Jiajun Zhang, Hanzhang Tu, Yun Liu, Boyao Zhou, Hongwen Zhang, Yebin Liu
PDF
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects Lei Fan, Dongdong Fan, Zhiguang Hu, Yiwen Ding, Donglin Di, Kai Yi, Maurice Pagnucco, Yang Song
PDF
MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Action Anticipation Olga Zatsarynna, Emad Bahrami, Yazan Abu Farha, Gianpiero Francesca, Juergen Gall
PDF
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining Yunze Liu, Li Yi
PDF
MAR-3D: Progressive Masked Auto-Regressor for High-Resolution 3D Generation Jinnan Chen, Lingting Zhu, Zeyu Hu, Shengju Qian, Yugang Chen, Xin Wang, Gim Hee Lee
PDF
MARBLE: Material Recomposition and Blending in CLIP-Space Ta Ying Cheng, Prafull Sharma, Mark Boss, Varun Jampani
PDF
MaRI: Material Retrieval Integration Across Domains Jianhui Wang, Zhifei Yang, Yangfan He, Huixiong Zhang, Yuxuan Chen, Jingwei Huang
PDF
MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures Lucas Morin, Valery Weber, Ahmed Nassar, Gerhard Ingmar Meijer, Luc Van Gool, Yawei Li, Peter Staar
PDF
Marten: Visual Question Answering with Mask Generation for Multi-Modal Document Understanding Zining Wang, Tongkun Guan, Pei Fu, Chen Duan, Qianyi Jiang, Zhentao Guo, Shan Guo, Junfeng Luo, Wei Shen, Xiaokang Yang
PDF
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation Sankalp Sinha, Mohammad Sadil Khan, Muhammad Usama, Shino Sam, Didier Stricker, Sk Aziz Ali, Muhammad Zeshan Afzal
PDF
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs Through Disentangled Spatial-Temporal Representations Kyungho Bae, Jinhyung Kim, Sihaeng Lee, Soonyoung Lee, Gunhee Lee, Jinwoo Choi
PDF
Mask-Adapter: The Devil Is in the Masks for Open-Vocabulary Segmentation Yongkang Li, Tianheng Cheng, Bin Feng, Wenyu Liu, Xinggang Wang
PDF
Mask^2DiT: Dual Mask-Based Diffusion Transformer for Multi-Scene Long Video Generation Tianhao Qi, Jianlong Yuan, Wanquan Feng, Shancheng Fang, Jiawei Liu, SiYu Zhou, Qian He, Hongtao Xie, Yongdong Zhang
PDF
Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding Yan Wang, Baoxiong Jia, Ziyu Zhu, Siyuan Huang
PDF
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding Pedro Hermosilla, Christian Stippel, Leon Sick
PDF
MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks Yifei Liu, Zhihang Zhong, Yifan Zhan, Sheng Xu, Xiao Sun
PDF
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction Jingcheng Ni, Yuxin Guo, Yichen Liu, Rui Chen, Lewei Lu, Zehuan Wu
PDF
Masking Meets Supervision: A Strong Learning Alliance Byeongho Heo, Taekyung Kim, Sangdoo Yun, Dongyoon Han
PDF
MaSS13K: A Matting-Level Semantic Segmentation Benchmark Chenxi Xie, Minghan Li, Hui Zeng, Jun Luo, Lei Zhang
PDF
MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors Riku Murai, Eric Dexheimer, Andrew J. Davison
PDF
MatAnyone: Stable Video Matting with Consistent Memory Propagation Peiqing Yang, Shangchen Zhou, Jixin Zhao, Qingyi Tao, Chen Change Loy
PDF
MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism from Sparse Views Antoine Guedon, Tomoki Ichikawa, Kohei Yamashita, Ko Nishino
PDF
MATCHA: Towards Matching Anything Fei Xue, Sven Elflein, Laura Leal-Taixé, Qunjie Zhou
PDF
Material Anything: Generating Materials for Any 3D Object via Diffusion Xin Huang, Tengfei Wang, Ziwei Liu, Qing Wang
PDF
Matrix-Free Shared Intrinsics Bundle Adjustment Daniel Safari
PDF
Matrix3D: Large Photogrammetry Model All-in-One Yuanxun Lu, Jingyang Zhang, Tian Fang, Jean-Daniel Nahmias, Yanghai Tsin, Long Quan, Xun Cao, Yao Yao, Shiwei Li
PDF
MBQ: Modality-Balanced Quantization for Large Vision-Language Models Shiyao Li, Yingchun Hu, Xuefei Ning, Xihui Liu, Ke Hong, Xiaotao Jia, Xiuhong Li, Yaqi Yan, Pei Ran, Guohao Dai, Shengen Yan, Huazhong Yang, Yu Wang
PDF
MC^2: Multi-Concept Guidance for Customized Multi-Concept Generation Jiaxiu Jiang, Yabo Zhang, Kailai Feng, Xiaohe Wu, Wenbo Li, Renjing Pei, Fan Li, Wangmeng Zuo
PDF
MCCD: Multi-Agent Collaboration-Based Compositional Diffusion for Complex Text-to-Image Generation Mingcheng Li, Xiaolu Hou, Ziyang Liu, Dingkang Yang, Ziyun Qian, Jiawei Chen, Jinjie Wei, Yue Jiang, Qingyao Xu, Lihua Zhang
PDF
MDP: Multidimensional Vision Model Pruning with Latency Constraint Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, Jingde Chen, Jose M. Alvarez
PDF
MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention Yuhan Wang, Fangzhou Hong, Shuai Yang, Liming Jiang, Wayne Wu, Chen Change Loy
PDF
MedUnifier: Unifying Vision-and-Language Pre-Training on Medical Data with Vision Generation Task Using Discrete Visual Representations Ziyang Zhang, Yang Yu, Yucheng Chen, Xulei Yang, Si Yong Yeo
PDF
Medusa: A Multi-Scale High-Order Contrastive Dual-Diffusion Approach for Multi-View Clustering Liang Chen, Zhe Xue, Yawen Li, Meiyu Liang, Yan Wang, Anton van den Hengel, Yuankai Qi
PDF
MEET: Towards Memory-Efficient Temporal Sparse Deep Neural Networks Zeqi Zhu, Ibrahim Batuhan Akkaya, Luc Waeijen, Egor Bondarev, Arash Pourtaherian, Orlando Moreira
PDF
MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing Cong Wang, Di Kang, Heyi Sun, Shenhan Qian, Zixuan Wang, Linchao Bao, Song-Hai Zhang
PDF
MEGA: Masked Generative Autoencoder for Human Mesh Recovery Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Francesc Moreno-Noguer
PDF
MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos Zhengqi Li, Richard Tucker, Forrester Cole, Qianqian Wang, Linyi Jin, Vickie Ye, Angjoo Kanazawa, Aleksander Holynski, Noah Snavely
PDF
MegaSynth: Scaling up 3D Scene Reconstruction with Synthesized Data Hanwen Jiang, Zexiang Xu, Desai Xie, Ziwen Chen, Haian Jin, Fujun Luan, Zhixin Shu, Kai Zhang, Sai Bi, Xin Sun, Jiuxiang Gu, Qixing Huang, Georgios Pavlakos, Hao Tan
PDF
Memories of Forgotten Concepts Matan Rusanovsky, Shimon Malnick, Amir Jevnisek, Ohad Fried, Shai Avidan
PDF
MERGE: Multi-Faceted Hierarchical Graph-Based GNN for Gene Expression Prediction from Whole Slide Histopathology Images Aniruddha Ganguly, Debolina Chatterjee, Wentao Huang, Jie Zhang, Alisa Yurovsky, Travis Steele Johnson, Chao Chen
PDF
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Siyuan Li, Luyuan Zhang, Zedong Wang, Juanxi Tian, Cheng Tan, Zicheng Liu, Chang Yu, Qingsong Xie, Haonan Lu, Haoqian Wang, Zhen Lei
PDF
MESC-3D:Mining Effective Semantic Cues for 3D Reconstruction from a Single Image Shaoming Li, Qing Cai, Songqi Kong, Runqing Tan, Heng Tong, Shiji Qiu, Yongguo Jiang, Zhi Liu
PDF
Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes Kaiwei Zhang, Dandan Zhu, Xiongkuo Min, Guangtao Zhai
PDF
MeshArt: Generating Articulated Meshes with Structure-Guided Transformers Daoyi Gao, Yawar Siddiqui, Lei Li, Angela Dai
PDF
MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation Zilong Chen, Yikai Wang, Wenqiang Sun, Feng Wang, Yiwen Chen, Huaping Liu
PDF
MET3R: Measuring Multi-View Consistency in Generated Images Mohammad Asim, Christopher Wewer, Thomas Wimmer, Bernt Schiele, Jan Eric Lenssen
PDF
Meta-Learning Hyperparameters for Parameter Efficient Fine-Tuning Zichen Tian, Yaoyao Liu, Qianru Sun
PDF
METASCENES: Towards Automated Replica Creation for Real-World 3D Scans Huangyue Yu, Baoxiong Jia, Yixin Chen, Yandan Yang, Puhao Li, Rongpeng Su, Jiaxin Li, Qing Li, Wei Liang, Song-Chun Zhu, Tengyu Liu, Siyuan Huang
PDF
MetaShadow: Object-Centered Shadow Detection, Removal, and Synthesis Tianyu Wang, Jianming Zhang, Haitian Zheng, Zhihong Ding, Scott Cohen, Zhe Lin, Wei Xiong, Chi-Wing Fu, Luis Figueroa, Soo Ye Kim
PDF
MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning Wenhao Gu, Li Gu, Chingyee Yee Suen, Yang Wang
PDF
MetricGrids: Arbitrary Nonlinear Approximation with Elementary Metric Grids Based Implicit Neural Representation Shu Wang, Yanbo Gao, Shuai Li, Chong Lv, Xun Cai, Chuankun Li, Hui Yuan, Jinglin Zhang
PDF
MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification Jianwei Zhao, Xin Li, Fan Yang, Qiang Zhai, Ao Luo, Yang Zhao, Hong Cheng, Huazhu Fu
PDF
MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting Mengqiu Xu, Kaixin Chen, Heng Guo, Yixiang Huang, Ming Wu, Zhenwei Shi, Chuang Zhang, Jun Guo
PDF
MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation Across Multiple Granularities Bizhu Wu, Jinheng Xie, Keming Shen, Zhe Kong, Jianfeng Ren, Ruibin Bai, Rong Qu, Linlin Shen
PDF
MI-DETR: An Object Detection Model with Multi-Time Inquiries Mechanism Zhixiong Nan, Xianghong Li, Jifeng Dai, Tao Xiang
PDF
MICAS: Multi-Grained In-Context Adaptive Sampling for 3D Point Cloud Processing Feifei Shao, Ping Liu, Zhao Wang, Yawei Luo, Hongwei Wang, Jun Xiao
PDF
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research James Burgess, Jeffrey J Nirschl, Laura Bravo-Sánchez, Alejandro Lozano, Sanket Rajan Gupte, Jesus G. Galaz-Montoya, Yuhui Zhang, Yuchang Su, Disha Bhowmik, Zachary Coman, Sarina M Hasan, Alexandra Johannesson, William D. Leineweber, Malvika G Nair, Ridhi Yarlagadda, Connor Zuraski, Wah Chiu, Sarah Cohen, Jan N. Hansen, Manuel D Leonetti, Chad Liu, Emma Lundberg, Serena Yeung-Levy
PDF
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation Zehuan Huang, Yuan-Chen Guo, Xingqiao An, Yunhan Yang, Yangguang Li, Zi-Xin Zou, Ding Liang, Xihui Liu, Yan-Pei Cao, Lu Sheng
PDF
Mimic In-Context Learning for Multimodal Tasks Yuchu Jiang, Jiale Fu, Chenduo Hao, Xinting Hu, Yingzhe Peng, Xin Geng, Xu Yang
PDF
Mimir: Improving Video Diffusion Models for Precise Text Understanding Shuai Tan, Biao Gong, Yutong Feng, Kecheng Zheng, Dandan Zheng, Shuwei Shi, Yujun Shen, Jingdong Chen, Ming Yang
PDF
MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output Yanyuan Chen, Dexuan Xu, Yu Huang, Songkun Zhan, Hanpin Wang, Dongxue Chen, Xueping Wang, Meikang Qiu, Hang Li
PDF
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling Yifang Men, Yuan Yao, Miaomiao Cui, Liefeng Bo
PDF
Mind the Gap: Confidence Discrepancy Can Guide Federated Semi-Supervised Learning Across Pseudo-Mismatch Yijie Liu, Xinyi Shang, Yiqun Zhang, Yang Lu, Chen Gong, Jing-Hao Xue, Hanzi Wang
PDF
Mind the Gap: Detecting Black-Box Adversarial Attacks in the Making Through Query Update Analysis Jeonghwan Park, Niall McLaughlin, Ihsen Alouani
PDF
Mind the Time: Temporally-Controlled Multi-Event Video Generation Ziyi Wu, Aliaksandr Siarohin, Willi Menapace, Ivan Skorokhodov, Yuwei Fang, Varnith Chordia, Igor Gilitschenski, Sergey Tulyakov
PDF
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking Junxi Chen, Junhao Dong, Xiaohua Xie
PDF
Minding Fuzzy Regions: A Data-Driven Alternating Learning Paradigm for Stable Lesion Segmentation Lexin Fang, Yunyang Xu, Xiang Ma, Xuemei Li, Caiming Zhang
PDF
MINIMA: Modality Invariant Image Matching Jiangwei Ren, Xingyu Jiang, Zizhuo Li, Dingkang Liang, Xin Zhou, Xiang Bai
PDF
Minimal Interaction Seperated Tuning: A New Paradigm for Visual Adaptation Ningyuan Tang, Minghao Fu, Jianxin Wu
PDF
Minimizing Labeled, Maximizing Unlabeled: An Image-Driven Approach for Video Instance Segmentation Fangyun Wei, Jinjing Zhao, Kun Yan, Chang Xu
PDF
Minority-Focused Text-to-Image Generation via Prompt Optimization Soobin Um, Jong Chul Ye
PDF
MIRE: Matched Implicit Neural Representations Dhananjaya Jayasundara, Heng Zhao, Demetrio Labate, Vishal M. Patel
PDF
MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World Ankit Dhiman, Manan Shah, R Venkatesh Babu
PDF
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval Yuanmin Tang, Jing Yu, Keke Gai, Jiamin Zhuang, Gang Xiong, Gaopeng Gou, Qi Wu
PDF
Mitigating Ambiguities in 3D Classification with Gaussian Splatting Ruiqi Zhang, Hao Zhu, Jingyi Zhao, Qi Zhang, Xun Cao, Zhan Ma
PDF
Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key Zhihe Yang, Xufang Luo, Dongqi Han, Yunjian Xu, Dongsheng Li
PDF
Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention Wenbin An, Feng Tian, Sicong Leng, Jiahao Nie, Haonan Lin, Qianying Wang, Ping Chen, Xiaoqin Zhang, Shijian Lu
PDF
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-Training for Robotic Manipulation Jiaming Zhou, Teli Ma, Kun-Yu Lin, Zifan Wang, Ronghe Qiu, Junwei Liang
PDF
MITracker: Multi-View Integration for Visual Object Tracking Mengjie Xu, Yitao Zhu, Haotian Jiang, Jiaming Li, Zhenrong Shen, Sheng Wang, Haolin Huang, Xinyu Wang, Han Zhang, Qing Yang, Qian Wang
PDF
MixerMDM: Learnable Composition of Human Motion Diffusion Models Pablo Ruiz-Ponce, German Barquero, Cristina Palmero, Sergio Escalera, José García-Rodríguez
PDF
Mixture of Submodules for Domain Adaptive Person Search Minsu Kim, Seungryong Kim, Kwanghoon Sohn
PDF
MLLM-as-a-Judge for Image Safety Without Human Labeling Zhenting Wang, Shuming Hu, Shiyu Zhao, Xiaowen Lin, Felix Juefei-Xu, Zhuowei Li, Ligong Han, Harihar Subramanyam, Li Chen, Jianfa Chen, Nan Jiang, Lingjuan Lyu, Shiqing Ma, Dimitris N. Metaxas, Ankit Jain
PDF
MLVU: Benchmarking Multi-Task Long Video Understanding Junjie Zhou, Yan Shu, Bo Zhao, Boya Wu, Zhengyang Liang, Shitao Xiao, Minghao Qin, Xi Yang, Yongping Xiong, Bo Zhang, Tiejun Huang, Zheng Liu
PDF
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments Ege Özsoy, Chantal Pellegrini, Tobias Czempiel, Felix Tristram, Kun Yuan, David Bani-Harouni, Ulrich Eck, Benjamin Busam, Matthias Keicher, Nassir Navab
PDF
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling Jian Yang, Dacheng Yin, Yizhou Zhou, Fengyun Rao, Wei Zhai, Yang Cao, Zheng-Jun Zha
PDF
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Ho Kei Cheng, Masato Ishii, Akio Hayakawa, Takashi Shibuya, Alexander Schwing, Yuki Mitsufuji
PDF
MMRL: Multi-Modal Representation Learning for Vision-Language Models Yuncheng Guo, Xiaodong Gu
PDF
MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao, Qiannan Guo, Jiayin Zhu, Pengfei Li, Zilong Chen, Huiming Yang, Zhiwei Li, Lening Wang, Tiao Tan, Huaping Liu
PDF
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Yilun Zhao, Haowei Zhang, Lujing Xie, Tongyan Hu, Guo Gan, Yitao Long, Zhiyuan Hu, Weiyuan Chen, Chuhan Li, Zhijian Xu, Chengye Wang, Ziyao Shangguan, Zhenwen Liang, Yixin Liu, Chen Zhao, Arman Cohan
PDF
MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots Tianchen Deng, Guole Shen, Chen Xun, Shenghai Yuan, Tongxin Jin, Hongming Shen, Yanbo Wang, Jingchuan Wang, Hesheng Wang, Danwei Wang, Weidong Chen
PDF
MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data Zifan Wang, Ziqing Chen, Junyu Chen, Jilong Wang, Yuxin Yang, Yunze Liu, Xueyi Liu, He Wang, Li Yi
PDF
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network Haoyang He, Jiangning Zhang, Yuxuan Cai, Hongxu Chen, Xiaobin Hu, Zhenye Gan, Yabiao Wang, Chengjie Wang, Yunsheng Wu, Lei Xie
PDF
MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices Jianwen Jiang, Gaojie Lin, Zhengkun Rong, Chao Liang, Yongming Zhu, Jiaqi Yang, Tianyun Zhong
PDF
MODA: Motion-Drift Augmentation for Inertial Human Motion Analysis Yinghao Wu, Shihui Guo, Yipeng Qin
PDF
MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong, Won-Sik Cheong, Jihyong Oh, Munchurl Kim
PDF
Model Diagnosis and Correction via Linguistic and Implicit Attribute Editing Xuanbai Chen, Xiang Xu, Zhihua Li, Tianchen Zhao, Pietro Perona, Qin Zhang, Yifan Xing
PDF
Model Poisoning Attacks to Federated Learning via Multi-Round Consistency Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong
PDF
Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks Wei-Jin Huang, Yuan-Ming Li, Zhi-Wei Xia, Yu-Ming Tang, Kun-Yu Lin, Jian-Fang Hu, Wei-Shi Zheng
PDF
Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-Identification Jiayu Jiang, Changxing Ding, Wentao Tan, Junhong Wang, Jin Tao, Xiangmin Xu
PDF
ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling Zikang Zhou, Hengjian Zhou, Haibo Hu, Zihao Wen, Jianping Wang, Yung-Hui Li, Yu-Kai Huang
PDF
MODfinity: Unsupervised Domain Adaptation with Multimodal Information Flow Intertwining Shanglin Liu, Jianming Lv, Jingdan Kang, Huaidong Zhang, Zequan Liang, Shengfeng He
PDF
MoEdit: On Learning Quantity Perception for Multi-Object Image Editing Yanfeng Li, Kahou Chan, Yue Sun, Chantong Lam, Tong Tong, Zitong Yu, Keren Fu, Xiaohong Liu, Tao Tan
PDF
MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation Huaize Liu, Wenzhang Sun, Donglin Di, Shibo Sun, Jiahui Yang, Changqing Zou, Hujun Bao
PDF
MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation Based Distillation Yuxiang Fu, Qi Yan, Lele Wang, Ke Li, Renjie Liao
PDF
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, Jiaolong Yang
PDF
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Taira Anderson, Erin Bransom, Kiana Ehsani, Huong Ngo, YenSung Chen, Ajay Patel, Mark Yatskar, Chris Callison-Burch, Andrew Head, Rose Hendrix, Favyen Bastani, Eli VanderBilt, Nathan Lambert, Yvonne Chou, Arnavi Chheda, Jenna Sparks, Sam Skjonsberg, Michael Schmitz, Aaron Sarnat, Byron Bischoff, Pete Walsh, Chris Newell, Piper Wolters, Tanmay Gupta, Kuo-Hao Zeng, Jon Borchardt, Dirk Groeneveld, Crystal Nam, Sophie Lebrecht, Caitlin Wittlif, Carissa Schoenick, Oscar Michel, Ranjay Krishna, Luca Weihs, Noah A. Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, Aniruddha Kembhavi
PDF
MoManipVLA: Transferring Vision-Language-Action Models for General Mobile Manipulation Zhenyu Wu, Yuheng Zhou, Xiuwei Xu, Ziwei Wang, Haibin Yan
PDF
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-Training Gen Luo, Xue Yang, Wenhan Dou, Zhaokai Wang, Jiawen Liu, Jifeng Dai, Yu Qiao, Xizhou Zhu
PDF
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion Songsong Yu, Yuxin Chen, Zhongang Qi, Zeke Xie, Yifan Wang, Lijun Wang, Ying Shan, Huchuan Lu
PDF
Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking Hongkai Wei, Yang Yang, Shijie Sun, Mingtao Feng, Xiangyu Song, Qi Lei, Hongli Hu, Rong Wang, Huansheng Song, Naveed Akhtar, Ajmal Saeed Mian
PDF
Monocular and Generalizable Gaussian Talking Head Animation Shengjie Gong, Haojie Li, Jiapeng Tang, Dongming Hu, Shuangping Huang, Hao Chen, Tianshui Chen, Zhuoman Liu
PDF
MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors Fanqi Pu, Yifan Wang, Jiru Deng, Wenming Yang
PDF
MonoInstance: Enhancing Monocular Priors via Multi-View Instance Alignment for Neural Rendering and Reconstruction Wenyuan Zhang, Yixiao Yang, Han Huang, Liang Han, Kanle Shi, Yu-Shen Liu, Zhizhong Han
PDF
MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection Rishubh Parihar, Srinjay Sarkar, Sarthak Vora, Jogendra Nath Kundu, R. Venkatesh Babu
PDF
MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models Yifan Liu, Keyu Fan, Weihao Yu, Chenxin Li, Hao Lu, Yixuan Yuan
PDF
MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection Hou-I Liu, Christine Wu, Jen-Hao Cheng, Wenhao Chai, Shian-Yun Wang, Gaowen Liu, Hugo Latapie, Jhih-Ciang Wu, Jenq-Neng Hwang, Hong-Han Shuai, Wen-Huang Cheng
PDF
MonSter: Marry Monodepth to Stereo Unleashes Power Junda Cheng, Longliang Liu, Gangwei Xu, Xianqi Wang, Zhaoxing Zhang, Yong Deng, Jinliang Zang, Yurui Chen, Zhipeng Cai, Xin Yang
PDF
Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization Jamie Wynn, Zawar Qureshi, Jakub Powierza, Jamie Watson, Mohamed Sayed
PDF
MOS-Attack: A Scalable Multi-Objective Adversarial Attack Framework Ping Guo, Cheng Gong, Xi Lin, Fei Liu, Zhichao Lu, Qingfu Zhang, Zhenkun Wang
PDF
MOS: Modeling Object-Scene Associations in Generalized Category Discovery Zhengyuan Peng, Jinpeng Ma, Zhimin Sun, Ran Yi, Haichuan Song, Xin Tan, Lizhuang Ma
PDF
Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning Jing Zhu, Yuhang Zhou, Shengyi Qian, Zhongmou He, Tong Zhao, Neil Shah, Danai Koutra
PDF
Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation Junha Lee, Chunghyun Park, Jaesung Choe, Yu-Chiang Frank Wang, Jan Kautz, Minsu Cho, Chris Choy
PDF
MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds Jiahui Lei, Yijia Weng, Adam W. Harley, Leonidas Guibas, Kostas Daniilidis
PDF
MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning Xu Han, Yuan Tang, Jinfeng Xu, Xianzhi Li
PDF
MotiF: Making Text Count in Image Animation with Motion Focal Loss Shijie Wang, Samaneh Azadi, Rohit Girdhar, Saketh Rambhatla, Chen Sun, Xi Yin
PDF
Motion Modes: What Could Happen Next? Karran Pandey, Yannick Hold-Geoffroy, Matheus Gadelha, Niloy J. Mitra, Karan Singh, Paul Guerrero
PDF
Motion Prompting: Controlling Video Generation with Motion Trajectories Daniel Geng, Charles Herrmann, Junhwa Hur, Forrester Cole, Serena Zhang, Tobias Pfaff, Tatiana Lopez-Guevara, Yusuf Aytar, Michael Rubinstein, Chen Sun, Oliver Wang, Andrew Owens, Deqing Sun
PDF
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level Andong Deng, Tongjia Chen, Shoubin Yu, Taojiannan Yang, Lincoln Spencer, Yapeng Tian, Ajmal Saeed Mian, Mohit Bansal, Chen Chen
PDF
MotionBench: Benchmarking and Improving Fine-Grained Video Motion Understanding for Vision Language Models Wenyi Hong, Yean Cheng, Zhuoyi Yang, Weihan Wang, Lefan Wang, Xiaotao Gu, Shiyu Huang, Yuxiao Dong, Jie Tang
PDF
MotionMap: Representing Multimodality in Human Pose Forecasting Reyhaneh Hosseininejad, Megh Shukla, Saeed Saadatnejad, Mathieu Salzmann, Alexandre Alahi
PDF
MotionPro: A Precise Motion Controller for Image-to-Video Generation Zhongwei Zhang, Fuchen Long, Zhaofan Qiu, Yingwei Pan, Wu Liu, Ting Yao, Tao Mei
PDF
MotionPRO: Exploring the Role of Pressure in Human MoCap and Beyond Shenghao Ren, Yi Lu, Jiayi Huang, Jiayi Zhao, He Zhang, Tao Yu, Qiu Shen, Xun Cao
PDF
Motions as Queries: One-Stage Multi-Person Holistic Human Motion Capture Kenkun Liu, Yurong Fu, Weihao Yuan, Jing Lin, Peihao Li, Xiaodong Gu, Lingteng Qiu, Haoqian Wang, Zilong Dong, Xiaoguang Han
PDF
MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation Shuwei Shi, Biao Gong, Xi Chen, Dandan Zheng, Shuai Tan, Zizheng Yang, Yuyuan Li, Jingwen He, Kecheng Zheng, Jingdong Chen, Ming Yang, Yinqiang Zheng
PDF
Move-in-2D: 2D-Conditioned Human Motion Generation Hsin-Ping Huang, Yang Zhou, Jui-Hsien Wang, Difan Liu, Feng Liu, Ming-Hsuan Yang, Zhan Xu
PDF
MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders Jiajun Cao, Yuan Zhang, Tao Huang, Ming Lu, Qizhe Zhang, Ruichuan An, Ningning Ma, Shanghang Zhang
PDF
Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts Feng Liang, Haoyu Ma, Zecheng He, Tingbo Hou, Ji Hou, Kunpeng Li, Xiaoliang Dai, Felix Juefei-Xu, Samaneh Azadi, Animesh Sinha, Peizhao Zhang, Peter Vajda, Diana Marculescu
PDF
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation Weijia Wu, Mingyu Liu, Zeyu Zhu, Xi Xia, Haoen Feng, Wen Wang, Kevin Qinghong Lin, Chunhua Shen, Mike Zheng Shou
PDF
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes Ruijie Lu, Yixin Chen, Junfeng Ni, Baoxiong Jia, Yu Liu, Diwen Wan, Gang Zeng, Siyuan Huang
PDF
MP-GUI: Modality Perception with MLLMs for GUI Understanding Ziwei Wang, Weizhi Chen, Leyang Yang, Sheng Zhou, Shengchu Zhao, Hanbei Zhan, Jiongchao Jin, Liangcheng Li, Zirui Shao, Jiajun Bu
PDF
MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion Zador Pataki, Paul-Edouard Sarlin, Johannes L. Schönberger, Marc Pollefeys
PDF
MPDrive: Improving Spatial Understanding with Marker-Based Prompt Learning for Autonomous Driving Zhiyuan Zhang, Xiaofan Li, Zhihao Xu, Wenjie Peng, Zijian Zhou, Miaojing Shi, Shuangping Huang
PDF
Mr. DETR: Instructive Multi-Route Training for Detection Transformers Chang-Bin Zhang, Yujie Zhong, Kai Han
PDF
MTADiffusion: Mask Text Alignment Diffusion Model for Object Inpainting Jun Huang, Ting Liu, Yihang Wu, Xiaochao Qu, Luoqi Liu, Xiaolin Hu
PDF
Multi-Focal Conditioned Latent Diffusion for Person Image Synthesis Jiaqi Liu, Jichao Zhang, Paolo Rota, Nicu Sebe
PDF
Multi-Granularity Class Prototype Topology Distillation for Class-Incremental Source-Free Unsupervised Domain Adaptation Peihua Deng, Jiehua Zhang, Xichun Sheng, Chenggang Yan, Yaoqi Sun, Ying Fu, Liang Li
PDF
Multi-Group Proportional Representations for Text-to-Image Models Sangwon Jung, Alex Oesterling, Claudio Mayrink Verdun, Sajani Vithana, Taesup Moon, Flavio P. Calmon
PDF
Multi-Label Prototype Visual Spatial Search for Weakly Supervised Semantic Segmentation Songsong Duan, Xi Yang, Nannan Wang
PDF
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices Junyan Lin, Haoran Chen, Yue Fan, Yingqi Fan, Xin Jin, Hui Su, Jinlan Fu, Xiaoyu Shen
PDF
Multi-Modal Aerial-Ground Cross-View Place Recognition with Neural ODEs Sijie Wang, Rui She, Qiyu Kang, Siqi Li, Disheng Li, Tianyu Geng, Shangshu Yu, Wee Peng Tay
PDF
Multi-Modal Contrastive Learning with Negative Sampling Calibration for Phenotypic Drug Discovery Jiahua Rao, Hanjing Lin, Leyu Chen, Jiancong Xie, Shuangjia Zheng, Yuedong Yang
PDF
Multi-Modal Contrastive Masked Autoencoders: A Two-Stage Progressive Pre-Training Approach for RGBD Datasets Muhammad Abdullah Jamal, Omid Mohareri
PDF
Multi-Modal Knowledge Distillation-Based Human Trajectory Forecasting Jaewoo Jeong, Seohee Lee, Daehee Park, Giwon Lee, Kuk-Jin Yoon
PDF
Multi-Modal Medical Diagnosis via Large-Small Model Collaboration Wanyi Chen, Zihua Zhao, Jiangchao Yao, Ya Zhang, Jiajun Bu, Haishuai Wang
PDF
Multi-Modal Synergistic Implicit Image Enhancement for Efficient Optical Flow Estimation Weichen Dai, Hexing Wu, Xiaoyang Weng, Yuxin Zheng, Yuhang Ming, Wanzeng Kong
PDF
Multi-Modal Topology-Embedded Graph Learning for Spatially Resolved Genes Prediction from Pathology Images with Prior Gene Similarity Information Hang Shi, Changxi Chi, Peng Wan, Daoqiang Zhang, Wei Shao
PDF
Multi-Modal Vision Pre-Training for Medical Image Analysis Shaohao Rui, Lingzhi Chen, Zhenyu Tang, Lilong Wang, Mianxin Liu, Shaoting Zhang, Xiaosong Wang
PDF
Multi-Party Collaborative Attention Control for Image Customization Han Yang, Chuanguang Yang, Qiuli Wang, Zhulin An, Weilun Feng, Libo Huang, Yongjun Xu
PDF
Multi-Resolution Pathology-Language Pre-Training Model with Text-Guided Visual Representation Shahad Albastaki, Anabia Sohail, Iyyakutti Iyappan Ganapathi, Basit Alawode, Asim Khan, Sajid Javed, Naoufel Werghi, Mohammed Bennamoun, Arif Mahmood
PDF
Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds Mohamed Abdelsamad, Michael Ulrich, Claudius Glaeser, Abhinav Valada
PDF
Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties Wenqiao Li, Bozhong Zheng, Xiaohao Xu, Jinye Gan, Fading Lu, Xiang Li, Na Ni, Zheng Tian, Xiaonan Huang, Shenghua Gao, Yingna Wu
PDF
Multi-Subject Open-Set Personalization in Video Generation Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Yuwei Fang, Kwot Sin Lee, Ivan Skorokhodov, Kfir Aberman, Jun-Yan Zhu, Ming-Hsuan Yang, Sergey Tulyakov
PDF
Multi-View Pose-Agnostic Change Localization with Zero Labels Chamuditha Jayanga Galappaththige, Jason Lai, Lloyd Windrim, Donald Dansereau, Niko Sunderhauf, Dimity Miller
PDF
Multi-View Reconstruction via SfM-Guided Monocular Depth Estimation Haoyu Guo, He Zhu, Sida Peng, Haotong Lin, Yunzhi Yan, Tao Xie, Wenguan Wang, Xiaowei Zhou, Hujun Bao
PDF
MultiGO: Towards Multi-Level Geometry Learning for Monocular 3D Textured Human Reconstruction Gangjian Zhang, Nanjie Yao, Shunsi Zhang, Hanfeng Zhao, Guoliang Pang, Jian Shu, Hao Wang
PDF
Multimodal Autoregressive Pre-Training of Large Vision Encoders Enrico Fini, Mustafa Shukor, Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju, Victor G. Turrisi da Costa, Louis Béthune, Zhe Gan, Alexander Toshev, Marcin Eichner, Moin Nabi, Yinfei Yang, Joshua Susskind, Alaaeldin El-Nouby
PDF
MultimodalStudio: A Heterogeneous Sensor Dataset and Framework for Neural Rendering Across Multiple Imaging Modalities Federico Lincetto, Gianluca Agresti, Mattia Rossi, Pietro Zanuttigh
PDF
MultiMorph: On-Demand Atlas Construction S. Mazdak Abulnaga, Andrew Hoopes, Neel Dey, Malte Hoffmann, Bruce Fischl, John Guttag, Adrian Dalca
PDF
Multiple Object Tracking as ID Prediction Ruopeng Gao, Ji Qi, Limin Wang
PDF
Multirate Neural Image Compression with Adaptive Lattice Vector Quantization Hao Xu, Xiaolin Wu, Xi Zhang
PDF
Multitwine: Multi-Object Compositing with Text and Layout Control Gemma Canet Tarrés, Zhe Lin, Zhifei Zhang, He Zhang, Andrew Gilbert, John Collomosse, Soo Ye Kim
PDF
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval Reno Kriz, Kate Sanders, David Etter, Kenton Murray, Cameron Carpenter, Hannah Recknor, Jimena Guallar-Blasco, Alexander Martin, Eugene Yang, Benjamin Van Durme
PDF
MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking Haolin Qin, Tingfa Xu, Tianhao Li, Zhenxiang Chen, Tao Feng, Jianan Li
PDF
MUSt3R: Multi-View Network for Stereo 3D Reconstruction Yohann Cabon, Lucas Stoffl, Leonid Antsfeld, Gabriela Csurka, Boris Chidlovskii, Jerome Revaud, Vincent Leroy
PDF
MuTri: Multi-View Tri-Alignment for OCT to OCTA 3D Image Translation Zhuangzhuang Chen, Hualiang Wang, Chubin Ou, Xiaomeng Li
PDF
MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views in 2 Seconds Zhenggang Tang, Yuchen Fan, Dilin Wang, Hongyu Xu, Rakesh Ranjan, Alexander Schwing, Zhicheng Yan
PDF
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts Peijie Wang, Zhong-Zhi Li, Fei Yin, Dekang Ran, Cheng-Lin Liu
PDF
MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation Aviral Chharia, Wenbo Gou, Haoye Dong
PDF
MVBoost: Boost 3D Reconstruction with Multi-View Refinement Xiangyu Liu, Xiaomei Zhang, Zhiyuan Ma, Xiangyu Zhu, Zhen Lei
PDF
MVDoppler-Pose: Multi-Modal Multi-View mmWave Sensing for Long-Distance Self-Occluded Human Walking Pose Estimation Jaeho Choi, Soheil Hor, Shubo Yang, Amin Arbabian
PDF
MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model Chenjie Cao, Chaohui Yu, Shang Liu, Fan Wang, Xiangyang Xue, Yanwei Fu
PDF
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D Wei Cheng, Juncheng Mu, Xianfang Zeng, Xin Chen, Anqi Pang, Chi Zhang, Zhibin Wang, Bin Fu, Gang Yu, Ziwei Liu, Liang Pan
PDF
MVPortrait: Text-Guided Motion and Emotion Control for Multi-View Vivid Portrait Animation Yukang Lin, Hokit Fung, Jianjin Xu, Zeping Ren, Adela S.M. Lau, Guosheng Yin, Xiu Li
PDF
MVSAnywhere: Zero-Shot Multi-View Stereo Sergio Izquierdo, Mohamed Sayed, Michael Firman, Guillermo Garcia-Hernando, Daniyar Turmukhambetov, Javier Civera, Oisin Mac Aodha, Gabriel Brostow, Jamie Watson
PDF
NADER: Neural Architecture Design via Multi-Agent Collaboration Zekang Yang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu
PDF
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions Chan Hur, Jeong-hun Hong, Dong-hun Lee, Dabin Kang, Semin Myeong, Sang-hyo Park, Hyeyoung Park
PDF
Navigating Image Restoration with VAR's Distribution Alignment Prior Siyang Wang, Naishan Zheng, Jie Huang, Feng Zhao
PDF
Navigating the Unseen: Zero-Shot Scene Graph Generation via Capsule-Based Equivariant Features Wenhuan Huang, Yi Ji, Guiqian Zhu, Li Ying, Chunping Liu
PDF
Navigation World Models Amir Bar, Gaoyue Zhou, Danny Tran, Trevor Darrell, Yann LeCun
PDF
Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models Namhyuk Ahn, KiYoon Yoo, Wonhyuk Ahn, Daesik Kim, Seung-Hun Nam
PDF
NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval Zengrong Lin, Zheng Wang, Tianwen Qian, Pan Mu, Sixian Chan, Cong Bai
PDF
NeISF++: Neural Incident Stokes Field for Polarized Inverse Rendering of Conductors and Dielectrics Chenhao Li, Taishi Ono, Takeshi Uemori, Sho Nitta, Hajime Mihara, Alexander Gatto, Hajime Nagahara, Yusuke Moriuchi
PDF
NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction Wenyuan Zhang, Emily Yue-ting Jia, Junsheng Zhou, Baorui Ma, Kanle Shi, Yu-Shen Liu, Zhizhong Han
PDF
Nested Diffusion Models Using Hierarchical Latent Priors Xiao Zhang, Ruoxi Jiang, Rebecca Willett, Michael Maire
PDF
Neural Hierarchical Decomposition for Single Image Plant Modeling Zhihao Liu, Zhanglin Cheng, Naoto Yokoya
PDF
Neural Inverse Rendering from Propagating Light Anagh Malik, Benjamin Attal, Andrew Xie, Matthew O'Toole, David B. Lindell
PDF
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion Zexin He, Tengfei Wang, Xin Huang, Xingang Pan, Ziwei Liu
PDF
Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning Chenjie Hao, Weyl Lu, Yifan Xu, Yubei Chen
PDF
Neural Video Compression with Context Modulation Chuanbo Tang, Zhuoyuan Li, Yifan Bian, Li Li, Dong Liu
PDF
Neuro-3D: Towards 3D Visual Decoding from EEG Signals Zhanqiang Guo, Jiamin Wu, Yonghao Song, Jiahui Bu, Weijian Mai, Qihao Zheng, Wanli Ouyang, Chunfeng Song
PDF
Neuro-Symbolic Evaluation of Text-to-Video Models Using Formal Verification S P Sharan, Minkyu Choi, Sahil Shah, Harsh Goel, Mohammad Omama, Sandeep Chinchali
PDF
Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition Yang Chen, Jingcai Guo, Song Guo, Dacheng Tao
PDF
NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting Yulong Zheng, Zicheng Jiang, Shengfeng He, Yandu Sun, Junyu Dong, Huaidong Zhang, Yong Du
PDF
NightAdapter: Learning a Frequency Adapter for Generalizable Night-Time Scene Segmentation Qi Bi, Jingjun Yi, Huimin Huang, Hao Zheng, Haolan Zhan, Yawen Huang, Yuexiang Li, Xian Wu, Yefeng Zheng
PDF
NitroFusion: High-Fidelity Single-Step Diffusion Through Dynamic Adversarial Training Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou, Yi-Zhe Song
PDF
NLPrompt: Noise-Label Prompt Learning for Vision-Language Models Bikang Pan, Qun Li, Xiaoying Tang, Wei Huang, Zhen Fang, Feng Liu, Jingya Wang, Jingyi Yu, Ye Shi
PDF
NN-Former: Rethinking Graph Structure in Neural Architecture Representation Ruihan Xu, Haokui Zhang, Yaowei Wang, Wei Zeng, Shiliang Zhang
PDF
nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark Yanfeng Zhou, Lingrui Li, Le Lu, Minfeng Xu
PDF
No Pains, More Gains: Recycling Sub-Salient Patches for Efficient High-Resolution Image Recognition Rong Qin, Xin Liu, Xingyu Liu, Jiaxuan Liu, Jinglei Shi, Liang Lin, Jufeng Yang
PDF
No Thing, Nothing: Highlighting Safety-Critical Classes for Robust LiDAR Semantic Segmentation in Adverse Weather Junsung Park, Hwijeong Lee, Inha Kang, Hyunjung Shim
PDF
Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement Hesong Li, Ziqi Wu, Ruiwen Shao, Tao Zhang, Ying Fu
PDF
Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis Boming Miao, Chunxiao Li, Xiaoxiao Wang, Andi Zhang, Rui Sun, Zizhe Wang, Yao Zhu
PDF
Noise Modeling in One Hour: Minimizing Preparation Efforts for Self-Supervised Low-Light RAW Image Denoising Feiran Li, Haiyang Jiang, Daisuke Iso
PDF
Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation Kunpeng Qiu, Zhiqiang Gao, Zhiying Zhou, Mingjie Sun, Yongxin Guo
PDF
Noise-Resistant Video Anomaly Detection via RGB Error-Guided Multiscale Predictive Coding and Dynamic Memory Han Hu, Wenli Du, Peng Liao, Bing Wang, Siyuan Fan
PDF
NoiseCtrl: A Sampling-Algorithm-Agnostic Conditional Generation Method for Diffusion Models Longquan Dai, He Wang, Jinhui Tang
PDF
Non-Natural Image Understanding with Advancing Frequency-Based Vision Encoders Wang Lin, QingSong Wang, Yueying Feng, Shulei Wang, Tao Jin, Zhou Zhao, Fei Wu, Chang Yao, Jingyuan Chen
PDF
Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction Cecilia Curreli, Dominik Muhle, Abhishek Saroha, Zhenzhang Ye, Riccardo Marin, Daniel Cremers
PDF
NoPain: No-Box Point Cloud Attack via Optimal Transport Singular Boundary Zezeng Li, Xiaoyu Du, Na Lei, Liming Chen, Weimin Wang
PDF
Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability Lei Wang, Senmao Li, Fei Yang, Jianye Wang, Ziheng Zhang, Yuhan Liu, Yaxing Wang, Jian Yang
PDF
Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models Hao Cheng, Erjia Xiao, Jiayan Yang, Jiahang Cao, Qiang Zhang, Jize Zhang, Kaidi Xu, Jindong Gu, Renjing Xu
PDF
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models Davide Berasi, Matteo Farina, Massimiliano Mancini, Elisa Ricci, Nicola Strisciuglio
PDF
NoT: Federated Unlearning via Weight Negation Yasser H. Khalil, Leo Brunswic, Soufiane Lamghari, Xu Li, Mahdi Beitollahi, Xi Chen
PDF
Notes-Guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering Wenlong Fang, Qiaofeng Wu, Jing Chen, Yun Xue
PDF
Novel View Synthesis with Pixel-Space Diffusion Models Noam Elata, Bahjat Kawar, Yaron Ostrovsky-Berman, Miriam Farber, Ron Sokolovsky
PDF
NSD-Imagery: A Benchmark Dataset for Extending fMRI Vision Decoding Methods to Mental Imagery Reese Kneeland, Paul S. Scotti, Ghislain St-Yves, Jesse Breedlove, Kendrick Kay, Thomas Naselaris
PDF
NTClick: Achieving Precise Interactive Segmentation with Noise-Tolerant Clicks Chenyi Zhang, Ting Liu, Xiaochao Qu, Luoqi Liu, Yao Zhao, Yunchao Wei
PDF
NTR-Gaussian: Nighttime Dynamic Thermal Reconstruction with 4D Gaussian Splatting Based on Thermodynamics Kun Yang, Yuxiang Liu, Zeyu Cui, Yu Liu, Maojun Zhang, Shen Yan, Qing Wang
PDF
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection Le Yang, Ziwei Zheng, Boxu Chen, Zhengyu Zhao, Chenhao Lin, Chao Shen
PDF
Number It: Temporal Grounding Videos like Flipping Manga Yongliang Wu, Xinting Hu, Yuyang Sun, Yizhou Zhou, Wenbo Zhu, Fengyun Rao, Bernt Schiele, Xu Yang
PDF
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images Lingen Li, Zhaoyang Zhang, Yaowei Li, Jiale Xu, Wenbo Hu, Xiaoyu Li, Weihao Cheng, Jinwei Gu, Tianfan Xue, Ying Shan
PDF
NVILA: Efficient Frontier Visual Language Models Zhijian Liu, Ligeng Zhu, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, Shang Yang, Haocheng Xi, Shiyi Cao, Yuxian Gu, Dacheng Li, Xiuyu Li, Haotian Tang, Yunhao Fang, Yukang Chen, Cheng-Yu Hsieh, De-An Huang, An-Chieh Cheng, Jinyi Hu, Sifei Liu, Ranjay Krishna, Pavlo Molchanov, Jan Kautz, Hongxu Yin, Song Han, Yao Lu
PDF
O-TPT: Orthogonality Constraints for Calibrating Test-Time Prompt Tuning in Vision-Language Models Ashshak Sharifdeen, Muhammad Akhtar Munir, Sanoojan Baliah, Salman Khan, Muhammad Haris Khan
PDF
Object Detection Using Event Camera: A MoE Heat Conduction Based Detector and a New Benchmark Dataset Xiao Wang, Yu Jin, Wentao Wu, Wei Zhang, Lin Zhu, Bo Jiang, Yonghong Tian
PDF
Object-Aware Sound Source Localization via Audio-Visual Scene Understanding Sung Jin Um, Dongjin Kim, Sangmin Lee, Jung Uk Kim
PDF
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation Xiaoqi Li, Jingyun Xu, Mingxu Zhang, Jiaming Liu, Yan Shen, Iaroslav Ponomarenko, Jiahui Xu, Liang Heng, Siyuan Huang, Shanghang Zhang, Hao Dong
PDF
Object-Shot Enhanced Grounding Network for Egocentric Video Yisen Feng, Haoyu Zhang, Meng Liu, Weili Guan, Liqiang Nie
PDF
ObjectMover: Generative Object Movement with Video Prior Xin Yu, Tianyu Wang, Soo Ye Kim, Paul Guerrero, Xi Chen, Qing Liu, Zhe Lin, Xiaojuan Qi
PDF
Occlusion-Aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition Khanh Nguyen, Ghulam Mubashar Hassan, Ajmal Mian
PDF
OccMamba: Semantic Occupancy Prediction with State Space Models Heng Li, Yuenan Hou, Xiaohan Xing, Yuexin Ma, Xiao Sun, Yanyong Zhang
PDF
OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad Luyao Tang, Yuxuan Yuan, Chaoqi Chen, Zeyu Zhang, Yue Huang, Kun Zhang
PDF
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding Wei Suo, Lijun Zhang, Mengyang Sun, Lin Yuanbo Wu, Peng Wang, Yanning Zhang
PDF
ODA-GAN: Orthogonal Decoupling Alignment GAN Assisted by Weakly-Supervised Learning for Virtual Immunohistochemistry Staining Tong Wang, Mingkang Wang, Zhongze Wang, Hongkai Wang, Qi Xu, Fengyu Cong, Hongming Xu
PDF
Odd-One-Out: Anomaly Detection by Comparing with Neighbors Ankan Bhunia, Changjian Li, Hakan Bilen
PDF
ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models Yahan Tu, Rui Hu, Jitao Sang
PDF
ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos Zetong Zhang, Manuel Kaufmann, Lixin Xue, Jie Song, Martin R. Oswald
PDF
OFER: Occluded Face Expression Reconstruction Pratheba Selvaraju, Victoria Fernandez Abrevaya, Timo Bolkart, Rick Akkerman, Tianyu Ding, Faezeh Amjadi, Ilya Zharkov
PDF
OffsetOPT: Explicit Surface Reconstruction Without Normals Huan Lei
PDF
Olympus: A Universal Task Router for Computer Vision Tasks Yuanze Lin, Yunsheng Li, Dongdong Chen, Weijian Xu, Ronald Clark, Philip Torr
PDF
Omni-ID: Holistic Identity Representation Designed for Generative Tasks Guocheng Qian, Kuan-Chieh Wang, Or Patashnik, Negin Heravi, Daniil Ostashev, Sergey Tulyakov, Daniel Cohen-Or, Kfir Aberman
PDF
Omni-RGPT: Unifying Image and Video Region-Level Understanding via Token Marks Miran Heo, Min-Hung Chen, De-An Huang, Sifei Liu, Subhashree Radhakrishnan, Seon Joo Kim, Yu-Chiang Frank Wang, Ryo Hachiuma
PDF
Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction Dongxu Wei, Zhiqi Li, Peidong Liu
PDF
Omnia De EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos Chiara Plizzari, Alessio Tonioni, Yongqin Xian, Achin Kulshrestha, Federico Tombari
PDF
Omnidirectional Multi-Object Tracking Kai Luo, Hao Shi, Sheng Wu, Fei Teng, Mengfei Duan, Chang Huang, Yuhang Wang, Kaiwei Wang, Kailun Yang
PDF
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Linke Ouyang, Yuan Qu, Hongbin Zhou, Jiawei Zhu, Rui Zhang, Qunshu Lin, Bin Wang, Zhiyuan Zhao, Man Jiang, Xiaomeng Zhao, Jin Shi, Fan Wu, Pei Chu, Minghao Liu, Zhenxiang Li, Chao Xu, Bo Zhang, Botian Shi, Zhongying Tu, Conghui He
PDF
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez
PDF
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows Shufan Li, Konstantinos Kallidromitis, Akash Gokul, Zichun Liao, Yusuke Kato, Kazuki Kozuka, Aditya Grover
PDF
OmniGen: Unified Image Generation Shitao Xiao, Yueze Wang, Junjie Zhou, Huaying Yuan, Xingrun Xing, Ruiran Yan, Chaofan Li, Shuting Wang, Tiejun Huang, Zheng Liu
PDF
OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking Xuanyu Zhang, Zecheng Tang, Zhipei Xu, Runyi Li, Youmin Xu, Bin Chen, Feng Gao, Jian Zhang
PDF
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints Mingjie Pan, Jiyao Zhang, Tianshu Wu, Yinghao Zhao, Wenlong Gao, Hao Dong
PDF
OmniMMI: A Comprehensive Multi-Modal Interaction Benchmark in Streaming Video Contexts Yuxuan Wang, Yueqian Wang, Bo Chen, Tong Wu, Dongyan Zhao, Zilong Zheng
PDF
OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities Suyoung Lee, Jaeyoung Chung, Kihoon Kim, Jaeyoo Huh, Gunhee Lee, Minsoo Lee, Kyoung Mu Lee
PDF
OmniStereo: Real-Time Omnidireactional Depth Estimation with Multiview Fisheye Cameras Jiaxi Deng, Yushen Wang, Haitao Meng, Zuoxun Hou, Yi Chang, Gang Chen
PDF
OmniStyle: Filtering High Quality Style Transfer Data at Scale Ye Wang, Ruiqi Liu, Jiang Lin, Fei Liu, Zili Yi, Yilin Wang, Rui Ma
PDF
On Denoising Walking Videos for Gait Recognition Dongyang Jin, Chao Fan, Jingzhe Ma, Jingkai Zhou, Weihua Chen, Shiqi Yu
PDF
On the Consistency of Video Large Language Models in Temporal Comprehension Minjoon Jung, Junbin Xiao, Byoung-Tak Zhang, Angela Yao
PDF
On the Generalization of Handwritten Text Recognition Models Carlos Garrido-Munoz, Jorge Calvo-Zaragoza
PDF
On the Out-of-Distribution Generalization of Large Multimodal Models Xingxuan Zhang, Jiansheng Li, Wenjing Chu, Junjia Hai, Renzhe Xu, Yuqing Yang, Shikai Guan, Jiazheng Xu, Liping Jing, Peng Cui
PDF
On the Zero-Shot Adversarial Robustness of Vision-Language Models: A Truly Zero-Shot and Training-Free Approach Baoshun Tong, Hanjiang Lai, Yan Pan, Jian Yin
PDF
On-Device Self-Supervised Learning of Low-Latency Monocular Depth from Only Events Jesse J. Hagenaars, Yilun Wu, Federico Paredes-Valles, Stein Stroobants, Guido C.H.E. de Croon
PDF
Once-Tuning-Multiple-Variants: Tuning Once and Expanded as Multiple Vision-Language Model Variants Chong Yu, Tao Chen, Zhongxue Gan
PDF
ONDA-Pose: Occlusion-Aware Neural Domain Adaptation for Self-Supervised 6d Object Pose Estimation Tao Tan, Qiulei Dong
PDF
One Diffusion to Generate Them All Duong H. Le, Tuan Pham, Sangho Lee, Christopher Clark, Aniruddha Kembhavi, Stephan Mandt, Ranjay Krishna, Jiasen Lu
PDF
One Is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception Yuchen Xia, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Yang Li, Xuanhan Zhu, Tianyou Luo, Siheng Chen, Jinglin Li
PDF
One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion Chunyang Cheng, Tianyang Xu, Zhenhua Feng, Xiaojun Wu, Zhangyong Tang, Hui Li, Zeyang Zhang, Sara Atito, Muhammad Awais, Josef Kittler
PDF
One-for-More: Continual Diffusion Model for Anomaly Detection Xiaofan Li, Xin Tan, Zhuo Chen, Zhizhong Zhang, Ruixin Zhang, Rizen Guo, Guanna Jiang, Yulong Chen, Yanyun Qu, Lizhuang Ma, Yuan Xie
PDF
One-Minute Video Generation with Test-Time Training Karan Dalal, Daniel Koceja, Jiarui Xu, Yue Zhao, Shihao Han, Ka Chun Cheung, Jan Kautz, Yejin Choi, Yu Sun, Xiaolong Wang
PDF
One-Shot 3D Object Canonicalization Based on Geometric and Semantic Consistency Li Jin, Yujie Wang, Wenzheng Chen, Qiyu Dai, Qingzhe Gao, Xueying Qin, Baoquan Chen
PDF
One-Step Event-Driven High-Speed Autofocus Yuhan Bao, Shaohua Gao, Wenyong Li, Kaiwei Wang
PDF
One-Way Ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models Senmao Li, Lei Wang, Kai Wang, Tao Liu, Jiehang Xie, Joost van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang
PDF
One2Any: One-Reference 6d Pose Estimation for Any Object Mengya Liu, Siyuan Li, Ajad Chhatkuli, Prune Truong, Luc Van Gool, Federico Tombari
PDF
Online Task-Free Continual Learning via Dynamic Expansionable Memory Distribution Fei Ye, Adrian G. Bors
PDF
Online Video Understanding: OVBench and VideoChat-Online Zhenpeng Huang, Xinhao Li, Jiaqi Li, Jing Wang, Xiangyu Zeng, Cheng Liang, Tao Wu, Xi Chen, Liang Li, Limin Wang
PDF
OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging Yijie Tang, Jiazhao Zhang, Yuqing Lan, Yulan Guo, Dezun Dong, Chenyang Zhu, Kai Xu
PDF
OODD: Test-Time Out-of-Distribution Detection with Dynamic Dictionary Yifeng Yang, Lin Zhu, Zewen Sun, Hengyu Liu, Qinying Gu, Nanyang Ye
PDF
Open Ad-Hoc Categorization with Contextualized Feature Learning Zilin Wang, Sangwoo Mo, Stella X. Yu, Sima Behpour, Liu Ren
PDF
Open Set Label Shift with Test Time Out-of-Distribution Reference Changkun Ye, Russell Tsuchida, Lars Petersson, Nick Barnes
PDF
Open-Canopy: Towards Very High Resolution Forest Monitoring Fajwel Fogel, Yohann Perron, Nikola Besic, Laurent Saint-André, Agnès Pellissier-Tanon, Martin Schwartz, Thomas Boudras, Ibrahim Fayad, Alexandre d'Aspremont, Loic Landrieu, Philippe Ciais
PDF
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang, Ruida Zhang, Xiangyang Ji, Marc Pollefeys, Francis Engelmann
PDF
Open-World Amodal Appearance Completion Jiayang Ao, Yanbei Jiang, Qiuhong Ke, Krista A. Ehinger
PDF
Open-World Objectness Modeling Unifies Novel Object Detection Shan Zhang, Yao Ni, Jinhao Du, Yuan Xue, Philip Torr, Piotr Koniusz, Anton van den Hengel
PDF
OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation Hui Li, Mingwang Xu, Yun Zhan, Shan Mu, Jiaye Li, Kaihui Cheng, Yuxuan Chen, Tan Chen, Mao Ye, Jingdong Wang, Siyu Zhu
PDF
OpenING: A Comprehensive Benchmark for Judging Open-Ended Interleaved Image-Text Generation Pengfei Zhou, Xiaopeng Peng, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, Kaipeng Zhang
PDF
OpenMIBOOD: Open Medical Imaging Benchmarks for Out-of-Distribution Detection Max Gutbrod, David Rauber, Danilo Weber Nunes, Christoph Palm
PDF
OpenSDI: Spotting Diffusion-Generated Images in the Open World Yabin Wang, Zhiwu Huang, Xiaopeng Hong
PDF
Opportunistic Single-Photon Time of Flight Sotiris Nousias, Mian Wei, Howard Xiao, Maxx Wu, Shahmeer Athar, Kevin J. Wang, Anagh Malik, David A. Barmherzig, David B. Lindell, Kyros N. Kutulakos
PDF
Optical-Flow Guided Prompt Optimization for Coherent Video Generation Hyelin Nam, Jaemin Kim, Dohun Lee, Jong Chul Ye
PDF
OPTICAL: Leveraging Optimal Transport for Contribution Allocation in Dataset Distillation Xiao Cui, Yulei Qin, Wengang Zhou, Hongsheng Li, Houqiang Li
PDF
OpticalNet: An Optical Imaging Dataset and Benchmark Beyond the Diffraction Limit Benquan Wang, Ruyi An, Jin-Kyu So, Sergei Kurdiumov, Eng Aik Chan, Giorgio Adamo, Yuhan Peng, Yewen Li, Bo An
PDF
Optimal Transport-Guided Source-Free Adaptation for Face Anti-Spoofing Zhuowei Li, Tianchen Zhao, Xiang Xu, Zheng Zhang, Zhihua Li, Xuanbai Chen, Qin Zhang, Alessandro Bergamo, Anil K. Jain, Yifan Xing
PDF
Optimizing for the Shortest Path in Denoising Diffusion Model Ping Chen, Xingpeng Zhang, Zhaoxiang Liu, Huan Hu, Xiang Liu, Kai Wang, Min Wang, Yanlin Qian, Shiguo Lian
PDF
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy Zaijing Li, Yuquan Xie, Rui Shao, Gongwei Chen, Dongmei Jiang, Liqiang Nie
PDF
OralXrays-9: Towards Hospital-Scale Panoramic X-Ray Anomaly Detection via Personalized Multi-Object Query-Aware Mining Bingzhi Chen, Sisi Fu, Xiaocheng Fang, Jieyi Cai, Boya Zhang, Minhua Lu, Yishu Liu
PDF
Order-One Rolling Shutter Cameras Marvin Anas Hahn, Kathlén Kohn, Orlando Marigliano, Tomas Pajdla
PDF
Order-Robust Class Incremental Learning: Graph-Driven Dynamic Similarity Grouping Guannan Lai, Yujie Li, Xiangkun Wang, Junbo Zhang, Tianrui Li, Xin Yang
PDF
ORIDa: Object-Centric Real-World Image Composition Dataset Jinwoo Kim, Sangmin Han, Jinho Jeong, Jiwoo Choi, Dongyeoung Kim, Seon Joo Kim
PDF
OSDFace: One-Step Diffusion Model for Face Restoration Jingkai Wang, Jue Gong, Lin Zhang, Zheng Chen, Xing Liu, Hong Gu, Yutong Liu, Yulun Zhang, Xiaokang Yang
PDF
OSLoPrompt: Bridging Low-Supervision Challenges and Open-Set Domain Generalization in CLIP N C Mohamad Hassan, Divyam Gupta, Mainak Singha, Sai Bhargav Rongali, Ankit Jha, Muhammad Haris Khan, Biplab Banerjee
PDF
OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction Gehui Li, Bin Chen, Chen Zhao, Lei Zhang, Jian Zhang
PDF
OSV: One Step Is Enough for High-Quality Image to Video Generation Xiaofeng Mao, Zhengkai Jiang, Fu-yun Wang, Jiangning Zhang, Hao Chen, Mingmin Chi, Yabiao Wang, Wenhan Luo
PDF
Ouroboros3D: Image-to-3D Generation via 3D-Aware Recursive Diffusion Hao Wen, Zehuan Huang, Yaohui Wang, Xinyuan Chen, Lu Sheng
PDF
Overcoming Shortcut Problem in VLM for Robust Out-of-Distribution Detection Zhuo Xu, Xiang Xiang, Yifan Liang
PDF
OverLoCK: An Overview-First-Look-Closely-Next ConvNet with Context-Mixing Dynamic Kernels Meng Lou, Yizhou Yu
PDF
OVO-Bench: How Far Is Your Video-LLMs from Real-World Online Video Understanding? Junbo Niu, Yifei Li, Ziyang Miao, Chunjiang Ge, Yuanhang Zhou, Qihao He, Xiaoyi Dong, Haodong Duan, Shuangrui Ding, Rui Qian, Pan Zhang, Yuhang Zang, Yuhang Cao, Conghui He, Jiaqi Wang
PDF
OW-OVD: Unified Open World and Open Vocabulary Object Detection Xing Xi, Yangyang Huang, Ronghua Luo, Yu Qiu
PDF
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models Mohamed Dhouib, Davide Buscaldi, Sonia Vanier, Aymen Shabou
PDF
Paint by Inpaint: Learning to Add Image Objects by Removing Them First Navve Wasserman, Noam Rotstein, Roy Ganz, Ron Kimmel
PDF
PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation Zidong Cao, Jinjing Zhu, Weiming Zhang, Hao Ai, Haotian Bai, Hengshuang Zhao, Lin Wang
PDF
PanoGS: Gaussian-Based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding Hongjia Zhai, Hai Li, Zhenzhe Li, Xiaokun Pan, Yijia He, Guofeng Zhang
PDF
Panorama Generation from NFoV Image Done Right Dian Zheng, Cheng Zhang, Xiao-Ming Wu, Cao Li, Chengfei Lv, Jian-Fang Hu, Wei-Shi Zheng
PDF
PanSplat: 4k Panorama Synthesis with Feed-Forward Gaussian Splatting Cheng Zhang, Haofei Xu, Qianyi Wu, Camilo Cruz Gambardella, Dinh Phung, Jianfei Cai
PDF
ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na, Hanbyul Joo
PDF
Parallel Sequence Modeling via Generalized Spatial Propagation Network Hongjun Wang, Wonmin Byeon, Jiarui Xu, Jinwei Gu, Ka Chun Cheung, Xiaolong Wang, Kai Han, Jan Kautz, Sifei Liu
PDF
Parallelized Autoregressive Visual Generation Yuqing Wang, Shuhuai Ren, Zhijie Lin, Yujin Han, Haoyuan Guo, Zhenheng Yang, Difan Zou, Jiashi Feng, Xihui Liu
PDF
Parameter Efficient Mamba Tuning via Projector-Targeted Diagonal-Centric Linear Transformation Seokil Ham, Hee-Seon Kim, Sangmin Woo, Changick Kim
PDF
Parameter-Efficient Fine-Tuning in Hyperspherical Space for Open-Vocabulary Semantic Segmentation Zelin Peng, Zhengqin Xu, Zhilin Zeng, Yu Huang, Yaoming Wang, Wei Shen
PDF
Parameterized Blur Kernel Prior Learning for Local Motion Deblurring Zhenxuan Fang, Fangfang Wu, Tao Huang, Le Dong, Weisheng Dong, Xin Li, Guangming Shi
PDF
Parametric Point Cloud Completion for Polygonal Surface Reconstruction Zhaiyu Chen, Yuqing Wang, Liangliang Nan, Xiao Xiang Zhu
PDF
PARC: A Quantitative Framework Uncovering the Symmetries Within Vision Language Models Jenny Schmalfuss, Nadine Chang, Vibashan Vs, Maying Shen, Andres Bruhn, Jose M. Alvarez
PDF
PartGen: Part-Level 3D Generation and Reconstruction with Multi-View Diffusion Models Minghao Chen, Roman Shapovalov, Iro Laina, Tom Monnier, Jianyuan Wang, David Novotny, Andrea Vedaldi
PDF
PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model Mingju Gao, Yike Pan, Huan-ang Gao, Zongzheng Zhang, Wenyi Li, Hao Dong, Hao Tang, Li Yi, Hao Zhao
PDF
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion Based Image Super-Resolution Libo Zhu, Jianze Li, Haotong Qin, Wenbo Li, Yulun Zhang, Yong Guo, Xiaokang Yang
PDF
Patch Matters: Training-Free Fine-Grained Image Caption Enhancement via Local Perception Ruotian Peng, Haiying He, Yake Wei, Yandong Wen, Di Hu
PDF
PatchDEMUX: A Certifiably Robust Framework for Multi-Label Classifiers Against Adversarial Patches Dennis Jacob, Chong Xiang, Prateek Mittal
PDF
PatchDPO: Patch-Level DPO for Finetuning-Free Personalized Image Generation Qihan Huang, Long Chan, Jinlong Liu, Wanggui He, Hao Jiang, Mingli Song, Jie Song
PDF
PatchGuard: Adversarially Robust Anomaly Detection and Localization Through Vision Transformers and Pseudo Anomalies Mojtaba Nafez, Amirhossein Koochakian, Arad Maleki, Jafar Habibi, Mohammad Hossein Rohban
PDF
PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-Wise Video Super-Resolution Shian Du, Menghan Xia, Chang Liu, Xintao Wang, Jing Wang, Pengfei Wan, Di Zhang, Xiangyang Ji
PDF
Pathways on the Image Manifold: Image Editing via Video Generation Noam Rotstein, Gal Yona, Daniel Silver, Roy Velich, David Bensaid, Ron Kimmel
PDF
Patient-Level Anatomy Meets Scanning-Level Physics: Personalized Federated Low-Dose CT Denoising Empowered by Large Language Model Ziyuan Yang, Yingyu Chen, Zhiwen Wang, Hongming Shan, Yang Chen, Yi Zhang
PDF
Pattern Analogies: Learning to Perform Programmatic Image Edits by Analogy Aditya Ganeshan, Thibault Groueix, Paul Guerrero, Radomir Mech, Matthew Fisher, Daniel Ritchie
PDF
PAVE: Patching and Adapting Video Large Language Models Zhuoming Liu, Yiquan Li, Khoi Duc Nguyen, Yiwu Zhong, Yin Li
PDF
Pay Attention to the Foreground in Object-Centric Learning Pinzhuo Tian, Shengjie Yang, Hang Yu, Alex Kot
PDF
PBR-NeRF: Inverse Rendering with Physics-Based Neural Fields Sean Wu, Shamik Basu, Tim Broedermann, Luc Van Gool, Christos Sakaridis
PDF
PCDreamer: Point Cloud Completion Through Multi-View Diffusion Priors Guangshun Wei, Yuan Feng, Long Ma, Chen Wang, Yuanfeng Zhou, Changjian Li
PDF
PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models Junhyuk So, Jiwoong Shin, Chaeyeon Jang, Eunhyeok Park
PDF
PDFactor: Learning Tri-Perspective View Policy Diffusion Field for Multi-Task Robotic Manipulation Jingyi Tian, Le Wang, Sanping Zhou, Sen Wang, Jiayi Li, Haowen Sun, Wei Tang
PDF
PEACE: Empowering Geologic mAP Holistic Understanding with MLLMs Yangyu Huang, Tianyi Gao, Haoran Xu, Qihao Zhao, Yang Song, Zhipeng Gui, Tengchao Lv, Hao Chen, Lei Cui, Scarlett Li, Furu Wei
PDF
PEER Pressure: Model-to-Model Regularization for Single Source Domain Generalization Dong Kyu Cho, Inwoo Hwang, Sanghack Lee
PDF
Percept, Memory, and Imagine: World Feature Simulating for Open-Domain Unknown Object Detection Aming Wu, Cheng Deng
PDF
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models Mahtab Bigverdi, Zelun Luo, Cheng-Yu Hsieh, Ethan Shen, Dongping Chen, Linda G. Shapiro, Ranjay Krishna
PDF
Perceptual Inductive Bias Is What You Need Before Contrastive Learning Junru Zhao, Tianqin Li, Dunhan Jiang, Shenghao Wu, Alan Ramirez, Tai Sing Lee
PDF
Perceptual Video Compression with Neural Wrapping Muhammad Umar Karim Khan, Aaron Chadha, Mohammad Ashraful Anam, Yiannis Andreopoulos
PDF
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics Lee Chae-Yeon, Oh Hyun-Bin, Han EunGi, Kim Sung-Bin, Suekyeong Nam, Tae-Hyun Oh
PDF
Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model Yuting Zhang, Hao Lu, Qingyong Hu, Yin Wang, Kaishen Yuan, Xin Liu, Kaishun Wu
PDF
PerLA: Perceptive 3D Language Assistant Guofeng Mei, Wei Lin, Luigi Riz, Yujiao Wu, Fabio Poiesi, Yiming Wang
PDF
PERSE: Personalized 3D Generative Avatars from a Single Portrait Hyunsoo Cha, Inhee Lee, Hanbyul Joo
PDF
Person De-Reidentification: A Variation-Guided Identity Shift Modeling Yi-Xing Peng, Yu-Ming Tang, Kun-Yu Lin, Qize Yang, Jingke Meng, Xihan Wei, Wei-Shi Zheng
PDF
PersonaBooth: Personalized Text-to-Motion Generation Boeun Kim, Hea In Jeong, JungHoon Sung, Yihua Cheng, Jeongmin Lee, Ju Yong Chang, Sang-Il Choi, Younggeun Choi, Saim Shin, Jungho Kim, Hyung Jin Chang
PDF
PersonaHOI: Effortlessly Improving Face Personalization in Human-Object Interaction Generation Xinting Hu, Haoran Wang, Jan Eric Lenssen, Bernt Schiele
PDF
Personalized Preference Fine-Tuning of Diffusion Models Meihua Dang, Anikait Singh, Linqi Zhou, Stefano Ermon, Jiaming Song
PDF
Perturb-and-Revise: Flexible 3D Editing with Generative Trajectories Susung Hong, Johanna Karras, Ricardo Martin-Brualla, Ira Kemelmacher-Shlizerman
PDF
pFedMxF: Personalized Federated Class-Incremental Learning with Mixture of Frequency Aggregation Yifei Zhang, Hao Zhu, Alysa Ziying Tan, Dianzhi Yu, Longtao Huang, Han Yu
PDF
PGC: Physics-Based Gaussian Cloth from a Single Pose Michelle Guo, Matt Jen-Yuan Chiang, Igor Santesteban, Nikolaos Sarafianos, Hsiao-yu Chen, Oshri Halimi, Aljaž Božič, Shunsuke Saito, Jiajun Wu, C. Karen Liu, Tuur Stuyck, Egor Larionov
PDF
PhD: A ChatGPT-Prompted Visual Hallucination Evaluation Dataset Jiazhen Liu, Yuhan Fu, Ruobing Xie, Runquan Xie, Xingwu Sun, Fengzong Lian, Zhanhui Kang, Xirong Li
PDF
PHGC: Procedural Heterogeneous Graph Completion for Natural Language Task Verification in Egocentric Videos Xun Jiang, Zhiyi Huang, Xing Xu, Jingkuan Song, Fumin Shen, Heng Tao Shen
PDF
Phoenix: A Motion-Based Self-Reflection Framework for Fine-Grained Robotic Action Correction Wenke Xia, Ruoxuan Feng, Dong Wang, Di Hu
PDF
PhyS-EdiT: Physics-Aware Semantic Image Editing with Text Description Ziqi Cai, Shuchen Weng, Yifei Xia, Boxin Shi
PDF
PhysAnimator: Physics-Guided Generative Cartoon Animation Tianyi Xie, Yiwei Zhao, Ying Jiang, Chenfanfu Jiang
PDF
PhysGen3D: Crafting a Miniature Interactive World from a Single Image Boyuan Chen, Hanxiao Jiang, Shaowei Liu, Saurabh Gupta, Yunzhu Li, Hao Zhao, Shenlong Wang
PDF
Physical Plausibility-Aware Trajectory Prediction via Locomotion Embodiment Hiromu Taketsugu, Takeru Oba, Takahiro Maeda, Shohei Nobuhara, Norimichi Ukita
PDF
PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations? Martin Spitznagel, Jan Vaillant, Janis Keuper
PDF
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability Weijie Zhou, Manli Tao, Chaoyang Zhao, Haiyun Guo, Honghui Dong, Ming Tang, Jinqiao Wang
PDF
PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation Qiyao Xue, Xiangyu Yin, Boyuan Yang, Wei Gao
PDF
PI-HMR: Towards Robust In-Bed Temporal Human Shape Reconstruction with Contact Pressure Sensing Ziyu Wu, Yufan Xiong, Mengting Niu, Fangting Xie, Quan Wan, Qijun Ying, Boyan Liu, Xiaohui Cai
PDF
PIAD: Pose and Illumination Agnostic Anomaly Detection Kaichen Yang, Junjie Cao, Zeyu Bai, Zhixun Su, Andrea Tagliasacchi
PDF
PICD: Versatile Perceptual Image Compression with Diffusion Rendering Tongda Xu, Jiahao Li, Bin Li, Yan Wang, Ya-Qin Zhang, Yan Lu
PDF
PICO: Reconstructing 3D People in Contact with Objects Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi, Arjun S. Lakshmipathy, Agniv Chatterjee, Michael J. Black, Dimitrios Tzionas
PDF
PIDLoc: Cross-View Pose Optimization Network Inspired by PID Controllers Wooju Lee, Juhye Park, Dasol Hong, Changki Sung, Youngwoo Seo, DongWan Kang, Hyun Myung
PDF
PIDSR: Complementary Polarized Image Demosaicing and Super-Resolution Shuangfan Zhou, Chu Zhou, Youwei Lyu, Heng Guo, Zhanyu Ma, Boxin Shi, Imari Sato
PDF
PillarHist: A Quantization-Aware Pillar Feature Encoder Based on Height-Aware Histogram Sifan Zhou, Zhihang Yuan, Dawei Yang, Xing Hu, Jian Qian, Ziyu Zhao
PDF
Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning Maosen Zhao, Pengtao Chen, Chong Yu, Yan Wen, Xudong Tan, Tao Chen
PDF
Pippo: High-Resolution Multi-View Humans from a Single Image Yash Kant, Ethan Weber, Jin Kyu Kim, Rawal Khirodkar, Su Zhaoen, Julieta Martinez, Igor Gilitschenski, Shunsuke Saito, Timur Bagautdinov
PDF
Pixel-Aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision Jinnyeong Kim, Seung-Hwan Baek
PDF
Pixel-Level and Semantic-Level Adjustable Super-Resolution: A Dual-LoRA Approach Lingchen Sun, Rongyuan Wu, Zhiyuan Ma, Shuaizheng Liu, Qiaosi Yi, Lei Zhang
PDF
PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes Bin Tan, Rui Yu, Yujun Shen, Nan Xue
PDF
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy Joonhyun Jeong, Seyun Bae, Yeonsung Jung, Jaeryong Hwang, Eunho Yang
PDF
PLeaS - Merging Models with Permutations and Least Squares Anshul Nasery, Jonathan Hayase, Pang Wei Koh, Sewoong Oh
PDF
Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-Facet Concept Control Basim Azam, Naveed Akhtar
PDF
Plug-and-Play PPO: An Adaptive Point Prompt Optimizer Making SAM Greater Xueyu Liu, Rui Wang, Yexin Lai, Guangze Shi, Feixue Shao, Fang Hao, Jianan Zhang, Jia Shen, Yongfei Wu, Wen Zheng
PDF
Plug-and-Play Versatile Compressed Video Enhancement Huimin Zeng, Jiacheng Li, Zhiwei Xiong
PDF
PMA: Towards Parameter-Efficient Point Cloud Understanding via Point Mamba Adapter Yaohua Zha, Yanzi Wang, Hang Guo, Jinpeng Wang, Tao Dai, Bin Chen, Zhihao Ouyang, Xue Yuerong, Ke Chen, Shu-Tao Xia
PDF
PMNI: Pose-Free Multi-View Normal Integration for Reflective and Textureless Surface Reconstruction Mingzhi Pei, Xu Cao, Xiangyi Wang, Heng Guo, Zhanyu Ma
PDF
PO3AD: Predicting Point Offsets Toward Better 3D Point Cloud Anomaly Detection Jianan Ye, Weiguang Zhao, Xi Yang, Guangliang Cheng, Kaizhu Huang
PDF
Point Cloud Upsampling Using Conditional Diffusion Module with Adaptive Noise Suppression Boqian Zhang, Shen Yang, Hao Chen, Chao Yang, Jing Jia, Guang Jiang
PDF
Point Clouds Meets Physics: Dynamic Acoustic Field Fitting Network for Point Cloud Understanding Changshuo Wang, Shuting He, Xiang Fang, Jiawei Han, Zhonghang Liu, Xin Ning, Weijun Li, Prayag Tiwari
PDF
Point-Cache: Test-Time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis Hongyu Sun, Qiuhong Ke, Ming Cheng, Yongcai Wang, Deying Li, Chenhui Gou, Jianfei Cai
PDF
Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting Wei Lin, Chenyang Zhao, Antoni B. Chan
PDF
Point2RBox-V2: Rethinking Point-Supervised Oriented Object Detection with Spatial Layout Among Instances Yi Yu, Botao Ren, Peiyuan Zhang, Mingxin Liu, Junwei Luo, Shaofeng Zhang, Feipeng Da, Junchi Yan, Xue Yang
PDF
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning Song Wang, Xiaolu Liu, Lingdong Kong, Jianyun Xu, Chunyong Hu, Gongfan Fang, Wentong Li, Jianke Zhu, Xinchao Wang
PDF
PointSR: Self-Regularized Point Supervision for Drone-View Object Detection Weizhuo Li, Yue Xi, Wenjing Jia, Zehao Zhang, Fei Li, Xiangzeng Liu, Qiguang Miao
PDF
PolarFree: Polarization-Based Reflection-Free Imaging Mingde Yao, Menglu Wang, King-Man Tam, Lingen Li, Tianfan Xue, Jinwei Gu
PDF
Polarized Color Screen Matting Kenji Enomoto, Scott Cohen, Brian Price, Tj Rhodes
PDF
PolarNeXt: Rethink Instance Segmentation with Polar Representation Jiacheng Sun, Xinghong Zhou, Yiqiang Wu, Bin Zhu, Jiaxuan Lu, Yu Qin, Xiaomao Li
PDF
Poly-Autoregressive Prediction for Modeling Interactions Neerja Thakkar, Tara Sadjadpour, Jathushan Rajasegeran, Shiry Ginosar, Jitendra Malik
PDF
POMP: Physics-Consistent Motion Generative Model Through Phase Manifolds Bin Ji, Ye Pan, Zhimeng Liu, Shuai Tan, Xiaogang Jin, Xiaokang Yang
PDF
POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality Joey Wilson, Marcelino Almeida, Sachit Mahajan, Martin Labrie, Maani Ghaffari, Omid Ghasemalizadeh, Min Sun, Cheng-Hao Kuo, Arnab Sen
PDF
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation Lanyun Zhu, Tianrun Chen, Qianxiong Xu, Xuanyi Liu, Deyi Ji, Haiyang Wu, De Wen Soh, Jun Liu
PDF
Population Normalization for Federated Learning Zhuoyao Wang, Fan Yi, Peizhu Gong, Caitou He, Cheng Jin, Weizhong Zhang
PDF
Pos3R: 6d Pose Estimation for Unseen Objects Made Easy Weijian Deng, Dylan Campbell, Chunyi Sun, Jiahao Zhang, Shubham Kanitkar, Matt E. Shaffer, Stephen Gould
PDF
Pose Priors from Language Models Sanjay Subramanian, Evonne Ng, Lea Müller, Dan Klein, Shiry Ginosar, Trevor Darrell
PDF
Pose-Guided Temporal Enhancement for Robust Low-Resolution Hand Reconstruction Kaixin Fan, Pengfei Ren, Jingyu Wang, Haifeng Sun, Qi Qi, Zirui Zhuang, Jianxin Liao
PDF
PoseBH: Prototypical Multi-Dataset Training Beyond Human Pose Estimation Uyoung Jeong, Jonathan Freer, Seungryul Baek, Hyung Jin Chang, Kwang In Kim
PDF
PoseTraj: Pose-Aware Trajectory Control in Video Diffusion Longbin Ji, Lei Zhong, Pengfei Wei, Changjian Li
PDF
Positive2Negative: Breaking the Information-Lossy Barrier in Self-Supervised Single Image Denoising Tong Li, Lizhi Wang, Zhiyuan Xu, Lin Zhu, Wanxuan Lu, Hua Huang
PDF
Post-Pre-Training for Modality Alignment in Vision-Language Foundation Models Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai, Kazuki Adachi, Daiki Chijiwa
PDF
POSTA: A Go-to Framework for Customized Artistic Poster Generation Haoyu Chen, Xiaojie Xu, Wenbo Li, Jingjing Ren, Tian Ye, Songhua Liu, Ying-Cong Chen, Lei Zhu, Xinchao Wang
PDF
PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering Yifan Gao, Zihang Lin, Chuanbin Liu, Min Zhou, Tiezheng Ge, Bo Zheng, Hongtao Xie
PDF
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation HsiaoYuan Hsu, Yuxin Peng
PDF
POT: Prototypical Optimal Transport for Weakly Supervised Semantic Segmentation Jian Wang, Tianhong Dai, Bingfeng Zhang, Siyue Yu, Eng Gee Lim, Jimin Xiao
PDF
Potential Field Based Deep Metric Learning Shubhang Bhatnagar, Narendra Ahuja
PDF
Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy, Lourdes Agapito, Jerome Revaud
PDF
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction Eduard Poesina, Adriana Valentina Costache, Adrian-Gabriel Chifu, Josiane Mothe, Radu Tudor Ionescu
PDF
Practical Solutions to the Relative Pose of Three Calibrated Cameras Charalambos Tzamos, Viktor Kocur, Yaqing Ding, Daniel Barath, Zuzana Berger Haladova, Torsten Sattler, Zuzana Kukelova
PDF
PRaDA: Projective Radial Distortion Averaging Daniil Sinitsyn, Linus Härenstam-Nielsen, Daniel Cremers
PDF
Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance Sanchayan Santra, Vishal Chudasama, Pankaj Wasnik, Vineeth N Balasubramanian
PDF
Precise, Fast, and Low-Cost Concept Erasure in Value Space: Orthogonal Complement Matters Yuan Wang, Ouxiang Li, Tingting Mu, Yanbin Hao, Kuien Liu, Xiang Wang, Xiangnan He
PDF
PreciseCam: Precise Camera Control for Text-to-Image Generation Edurne Bernal-Berdun, Ana Serrano, Belen Masia, Matheus Gadelha, Yannick Hold-Geoffroy, Xin Sun, Diego Gutierrez
PDF
Preconditioners for the Stochastic Training of Neural Fields Shin-Fang Chng, Hemanth Saratchandran, Simon Lucey
PDF
PrEditor3D: Fast and Precise 3D Shape Editing Ziya Erkoç, Can Gümeli, Chaoyang Wang, Matthias Nießner, Angela Dai, Peter Wonka, Hsin-Ying Lee, Peiye Zhuang
PDF
Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing Yoonjeon Kim, Soohyun Ryu, Yeonsung Jung, Hyunkoo Lee, Joowon Kim, June Yong Yang, Jaeryong Hwang, Eunho Yang
PDF
Preserving Clusters in Prompt Learning for Unsupervised Domain Adaptation Tung-Long Vuong, Hoang Phan, Vy Vo, Anh Bui, Thanh-Toan Do, Trung Le, Dinh Phung
PDF
Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models Hao Ren, Yiming Zeng, Zetong Bi, Zhaoliang Wan, Junlong Huang, Hui Cheng
PDF
Prior-Free 3D Object Tracking Xiuqiang Song, Li Jin, Zhengxian Zhang, Jiachen Li, Fan Zhong, Guofeng Zhang, Xueying Qin
PDF
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification Xiangyan Qu, Gaopeng Gou, Jiamin Zhuang, Jing Yu, Kun Song, Qihao Wang, Yili Li, Gang Xiong
PDF
Probabilistic Prompt Distribution Learning for Animal Pose Estimation Jiyong Rao, Brian Nlong Zhao, Yu Wang
PDF
Probability Density Geodesics in Image Diffusion Latent Space Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang, Peter Henry Tu, Jing Zhang, Hongdong Li, Richard Hartley, Dylan Campbell
PDF
ProbeSDF: Light Field Probes for Neural Surface Reconstruction Briac Toussaint, Diego Thomas, Jean-Sébastien Franco
PDF
Probing the Mid-Level Vision Capabilities of Self-Supervised Learning Xuweiyi Chen, Markus Marks, Zezhou Cheng
PDF
ProbPose: A Probabilistic Approach to 2D Human Pose Estimation Miroslav Purkrabek, Jiri Matas
PDF
Prof. Robot: Differentiable Robot Rendering Without Static and Self-Collisions Quanyuan Ruan, Jiabao Lei, Wenhao Yuan, Yanglin Zhang, Dekun Lu, Guiliang Liu, Kui Jia
PDF
Progress-Aware Video Frame Captioning Zihui Xue, Joungbin An, Xitong Yang, Kristen Grauman
PDF
Progressive Correspondence Regenerator for Robust 3D Registration Guiyu Zhao, Sheng Ao, Ye Zhang, Kai Xu, Yulan Guo
PDF
Progressive Focused Transformer for Single Image Super-Resolution Wei Long, Xingyu Zhou, Leheng Zhang, Shuhang Gu
PDF
Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation Without 3D Data Zhiyuan Ma, Xinyue Liang, Rongyuan Wu, Xiangyu Zhu, Zhen Lei, Lei Zhang
PDF
ProHOC: Probabilistic Hierarchical Out-of-Distribution Classification via Multi-Depth Networks Erik Wallin, Fredrik Kahl, Lars Hammarstrand
PDF
ProjAttacker: A Configurable Physical Adversarial Attack for Face Recognition via Projector Yuanwei Liu, Hui Wei, Chengyu Jia, Ruqi Xiao, Weijian Ruan, Xingxing Wei, Joey Tianyi Zhou, Zheng Wang
PDF
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness Beier Zhu, Jiequan Cui, Hanwang Zhang, Chi Zhang
PDF
ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models Yassir Bendou, Amine Ouasfi, Vincent Gripon, Adnane Boukhayma
PDF
Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation Yuanbo Yang, Jiahao Shao, Xinyang Li, Yujun Shen, Andreas Geiger, Yiyi Liao
PDF
Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis Arpita Chowdhury, Dipanjyoti Paul, Zheda Mai, Jianyang Gu, Ziheng Zhang, Kazi Sajeed Mehrab, Elizabeth G. Campolongo, Daniel Rubenstein, Charles V. Stewart, Anuj Karpatne, Tanya Berger-Wolf, Yu Su, Wei-Lun Chao
PDF
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attack on Breast Ultrasound Images Yasamin Medghalchi, Moein Heidari, Clayton Allard, Leonid Sigal, Ilker Hacihaliloglu
PDF
PromptHash:Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval Qiang Zou, Shuli Cheng, Jiayi Chen
PDF
PromptHMR: Promptable Human Mesh Recovery Yufu Wang, Yu Sun, Priyanka Patel, Kostas Daniilidis, Michael J. Black, Muhammed Kocabas
PDF
Prompting Depth Anything for 4k Resolution Accurate Metric Depth Estimation Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, Bingyi Kang
PDF
ProReflow: Progressive Reflow with Decomposed Velocity Lei Ke, Haohang Xu, Xuefei Ning, Yu Li, Jiajun Li, Haoling Li, Yuxuan Lin, Dongsheng Jiang, Yujiu Yang, Linfeng Zhang
PDF
Prosody-Enhanced Acoustic Pre-Training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing Zhedong Zhang, Liang Li, Chenggang Yan, Chunshan Liu, Anton van den Hengel, Yuankai Qi
PDF
Protecting Your Video Content: Disrupting Automated Video-Based LLM Annotations Haitong Liu, Kuofeng Gao, Yang Bai, Jinmin Li, Jinxiao Shan, Tao Dai, Shu-Tao Xia
PDF
ProtoDepth: Unsupervised Continual Depth Completion with Prototypes Patrick Rim, Hyoungseob Park, S. Gangopadhyay, Ziyao Zeng, Younjoon Chung, Alex Wong
PDF
Prototype-Based Image Prompting for Weakly Supervised Histopathological Image Segmentation Qingchen Tang, Lei Fan, Maurice Pagnucco, Yang Song
PDF
Provoking Multi-Modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning Cheng Chen, Yunpeng Zhai, Yifan Zhao, Jinyang Gao, Bolin Ding, Jia Li
PDF
Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel Imaging Ping Wang, Lishun Wang, Gang Qu, Xiaodong Wang, Yulun Zhang, Xin Yuan
PDF
ProxyTransformation: Preshaping Point Cloud Manifold with Proxy Attention for 3D Visual Grounding Qihang Peng, Henry Zheng, Gao Huang
PDF
PS-Diffusion: Photorealistic Subject-Driven Image Editing with Disentangled Control and Attention Weicheng Wang, Guoli Jia, Zhongqi Zhang, Liang Lin, Jufeng Yang
PDF
PS-EIP: Robust Photometric Stereo Based on Event Interval Profile Kazuma Kitazawa, Takahito Aoto, Satoshi Ikehata, Tsuyoshi Takatani
PDF
PSA-SSL: Pose and Size-Aware Self-Supervised Learning on LiDAR Point Clouds Barza Nisar, Steven L. Waslander
PDF
PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection Wei Li, Pin-Yu Chen, Sijia Liu, Ren Wang
PDF
Pseudo Visible Feature Fine-Grained Fusion for Thermal Object Detection Ting Li, Mao Ye, Tianwen Wu, Nianxin Li, Shuaifeng Li, Song Tang, Luping Ji
PDF
PSHuman: Photorealistic Single-Image 3D Human Reconstruction Using Cross-Scale Multiview Diffusion and Explicit Remeshing Peng Li, Wangguandong Zheng, Yuan Liu, Tao Yu, Yangguang Li, Xingqun Qi, Xiaowei Chi, Siyu Xia, Yan-Pei Cao, Wei Xue, Wenhan Luo, Yike Guo
PDF
PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model Xiang Gao, Shuai Yang, Jiaying Liu
PDF
PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting Alex Hanson, Allen Tu, Vasu Singla, Mayuka Jayawardhana, Matthias Zwicker, Tom Goldstein
PDF
PURA: Parameter Update-Recovery Test-Time Adaption for RGB-T Tracking Zekai Shao, Yufan Hu, Bin Fan, Hongmin Liu
PDF
Pursuing Temporal-Consistent Video Virtual Try-on via Dynamic Pose Interaction Dong Li, Wenqi Zhong, Wei Yu, Yingwei Pan, Dingwen Zhang, Ting Yao, Junwei Han, Tao Mei
PDF
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models Chenyu Yang, Xuan Dong, Xizhou Zhu, Weijie Su, Jiahao Wang, Hao Tian, Zhe Chen, Wenhai Wang, Lewei Lu, Jifeng Dai
PDF
PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction Sinisa Stekovic, Arslan Artykov, Stefan Ainetter, Mattia D'Urso, Friedrich Fraundorfer
PDF
Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs Zicheng Zhang, Ziheng Jia, Haoning Wu, Chunyi Li, Zijian Chen, Yingjie Zhou, Wei Sun, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai
PDF
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, Wenwu Zhu
PDF
Q-Eval-100k: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content Zicheng Zhang, Tengchuan Kou, Shushi Wang, Chunyi Li, Wei Sun, Wei Wang, Xiaoyu Li, Zongyu Wang, Xuezhi Cao, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai
PDF
Q-PART: Quasi-Periodic Adaptive Regression with Test-Time Training for Pediatric Left Ventricular Ejection Fraction Regression Jie Liu, Tiexin Qin, Hui Liu, Yilei Shi, Lichao Mou, Xiao Xiang Zhu, Shiqi Wang, Haoliang Li
PDF
QMambaBSR: Burst Image Super-Resolution with Query State Space Model Xin Di, Long Peng, Peizhe Xia, Wenbo Li, Renjing Pei, Yang Cao, Yang Wang, Zheng-Jun Zha
PDF
Quad-Pixel Image Defocus Deblurring: A New Benchmark and Model Hang Chen, Yin Xie, Xiaoxiu Peng, Lihu Sun, Wenkai Su, Xiaodong Yang, Chengming Liu
PDF
Quaffure: Real-Time Quasi-Static Neural Hair Simulation Tuur Stuyck, Gene Wei-Chin Lin, Egor Larionov, Hsiao-yu Chen, Aljaz Bozic, Nikolaos Sarafianos, Doug Roble
PDF
Quantization Without Tears Minghao Fu, Hao Yu, Jie Shao, Junjie Zhou, Ke Zhu, Jianxin Wu
PDF
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge Xuan Shen, Weize Ma, Jing Liu, Changdi Yang, Rui Ding, Quanyi Wang, Henghui Ding, Wei Niu, Yanzhi Wang, Pu Zhao, Jun Lin, Jiuxiang Gu
PDF
QuCOOP: A Versatile Framework for Solving Composite and Binary-Parametrised Problems on Quantum Annealers Natacha Kuete Meli, Vladislav Golyanik, Marcel Seelbach Benkner, Michael Moeller
PDF
Query Efficient Black-Box Visual Prompting with Subspace Learning Zhaogeng Liu, Haozhen Zhang, Hualin Zhang, Xingchen Li, Wanli Shi, Bin Gu, Yi Chang
PDF
Question-Aware Gaussian Experts for Audio-Visual Question Answering Hongyeob Kim, Inyoung Jung, Dayoon Suh, Youjia Zhang, Sangmin Lee, Sungeun Hong
PDF
R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization Xudong Jiang, Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Marc Pollefeys
PDF
R-TPT: Improving Adversarial Robustness of Vision-Language Models Through Test-Time Prompt Tuning Lijun Sheng, Jian Liang, Zilei Wang, Ran He
PDF
R2C: Mapping Room to Chessboard to Unlock LLM as Low-Level Action Planner Ziyi Bai, Hanxuan Li, Bin Fu, Chuyan Xiong, Ruiping Wang, Xilin Chen
PDF
RaCFormer: Towards High-Quality 3D Object Detection via Query-Based Radar-Camera Fusion Xiaomeng Chu, Jiajun Deng, Guoliang You, Yifan Duan, Houqiang Li, Yanyong Zhang
PDF
RAD: Region-Aware Diffusion Models for Image Inpainting Sora Kim, Sungho Suh, Minsik Lee
PDF
Radio Frequency Ray Tracing with Neural Object Representation for Enhanced RF Modeling Xingyu Chen, Zihao Feng, Kun Qian, Xinyu Zhang
PDF
RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models Greg Heinrich, Mike Ranzinger, Hongxu Yin, Yao Lu, Jan Kautz, Andrew Tao, Bryan Catanzaro, Pavlo Molchanov
PDF
RAEncoder: A Label-Free Reversible Adversarial Examples Encoder for Dataset Intellectual Property Protection Fan Xing, Zhuo Tian, Xuefeng Fan, Xiaoyi Zhou
PDF
RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting Qiyu Dai, Xingyu Ni, Qianfan Shen, Wenzheng Chen, Baoquan Chen, Mengyu Chu
PDF
RandAR: Decoder-Only Autoregressive Visual Generation in Random Orders Ziqi Pang, Tianyuan Zhang, Fujun Luan, Yunze Man, Hao Tan, Kai Zhang, William T. Freeman, Yu-Xiong Wang
PDF
Random Conditioning for Diffusion Model Compression with Distillation Dohyun Kim, Sehwan Park, Geonhee Han, Seung Wook Kim, Paul Hongsuck Seo
PDF
RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings Aayush Dhakal, Srikumar Sastry, Subash Khanal, Adeel Ahmad, Eric Xing, Nathan Jacobs
PDF
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models Haoran Hao, Jiaming Han, Changsheng Li, Yu-Feng Li, Xiangyu Yue
PDF
Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time Jon Donnelly, Zhicheng Guo, Alina Jade Barnett, Hayden McTavish, Chaofan Chen, Cynthia Rudin
PDF
RASP: Revisiting 3D Anamorphic Art for Shadow-Guided Packing of Irregular Objects Soumyaratna Debnath, Ashish Tiwari, Kaustubh Sadekar, Shanmuganathan Raman
PDF
RaSS: Improving Denoising Diffusion Samplers with Reinforced Active Sampling Scheduler Xin Ding, Lei Yu, Xin Li, Zhijun Tu, Hanting Chen, Jie Hu, Zhibo Chen
PDF
Rate-in: Information-Driven Adaptive Dropout Rates for Improved Inference-Time Uncertainty Estimation Tal Zeevi, Ravid Shwartz-Ziv, Yann LeCun, Lawrence H. Staib, John A. Onofrey
PDF
RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories Huiyang Shao, Xin Xia, Yuhong Yang, Yuxi Ren, Xing Wang, Xuefeng Xiao
PDF
RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network Van-Tin Luu, Yon-Lin Cai, Vu-Hoang Tran, Wei-Chen Chiu, Yi-Ting Chen, Ching-Chun Huang
PDF
RCP-Bench: Benchmarking Robustness for Collaborative Perception Under Diverse Corruptions Shihang Du, Sanqing Qu, Tianhang Wang, Xudong Zhang, Yunwei Zhu, Jian Mao, Fan Lu, Qiao Lin, Guang Chen
PDF
RDD: Robust Feature Detector and Descriptor Using Deformable Transformer Gonglin Chen, Tianwen Fu, Haiwei Chen, Wenbin Teng, Hanyuan Xiao, Yajie Zhao
PDF
Re-HOLD: Video Hand Object Interaction Reenactment via Adaptive Layout-Instructed Diffusion Model Yingying Fan, Quanwei Yang, Kaisiyuan Wang, Hang Zhou, Yingying Li, Haocheng Feng, Errui Ding, Yu Wu, Jingdong Wang
PDF
Re-Thinking Temporal Search for Long-Form Video Understanding Jinhui Ye, Zihan Wang, Haosen Sun, Keshigeyan Chandrasegaran, Zane Durante, Cristobal Eyzaguirre, Yonatan Bisk, Juan Carlos Niebles, Ehsan Adeli, Li Fei-Fei, Jiajun Wu, Manling Li
PDF
Real-IAD D3: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection Wenbing Zhu, Lidong Wang, Ziqing Zhou, Chengjie Wang, Yurui Pan, Ruoyi Zhang, Zhuhao Chen, Linjie Cheng, Bin-Bin Gao, Jiangning Zhang, Zhenye Gan, Yuxie Wang, Yulong Chen, Shuguang Qian, Mingmin Chi, Bo Peng, Lizhuang Ma
PDF
Real-Time Free-View Human Rendering from Sparse-View RGB Videos Using Double Unprojected Textures Guoxing Sun, Rishabh Dabral, Heming Zhu, Pascal Fua, Christian Theobalt, Marc Habermann
PDF
Real-Time High-Fidelity Gaussian Human Avatars with Position-Based Interpolation of Spatially Distributed MLPs Youyi Zhan, Tianjia Shao, Yin Yang, Kun Zhou
PDF
RealEdit: Reddit Edits as a Large-Scale Empirical Dataset for Image Transformations Peter Sushko, Ayana Bharadwaj, Zhi Yang Lim, Vasily Ilin, Ben Caffee, Dongping Chen, Mohammadreza Salehi, Cheng-Yu Hsieh, Ranjay Krishna
PDF
Realistic Test-Time Adaptation of Vision-Language Models Maxime Zanella, Clément Fuchs, Christophe De Vleeschouwer, Ismail Ben Ayed
PDF
Reanimating Images Using Neural Representations of Dynamic Stimuli Jacob Yeung, Andrew F. Luo, Gabriel Sarch, Margaret M. Henderson, Deva Ramanan, Michael J. Tarr
PDF
Reason-Before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval Yuanmin Tang, Jue Zhang, Xiaoting Qin, Jing Yu, Gaopeng Gou, Gang Xiong, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Wu
PDF
ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning Zhenyang Liu, Yikai Wang, Sixiao Zheng, Tongying Pan, Longfei Liang, Yanwei Fu, Xiangyang Xue
PDF
Reasoning in Visual Navigation of End-to-End Trained Agents: A Dynamical Systems Approach Steeven Janny, Hervé Poirier, Leonid Antsfeld, Guillaume Bono, Gianluca Monaci, Boris Chidlovskii, Francesco Giuliari, Alessio Del Bue, Christian Wolf
PDF
Reasoning Mamba: Hypergraph-Guided Region Relation Calculating for Weakly Supervised Affordance Grounding Yuxuan Wang, Aming Wu, Muli Yang, Yukuan Min, Yihang Zhu, Cheng Deng
PDF
Reasoning to Attend: Try to Understand How <SEG> Token Works Rui Qian, Xin Yin, Dejing Dou
PDF
ReCap: Better Gaussian Relighting with Cross-Environment Captures Jingzhi Li, Zongwei Wu, Eduard Zamfir, Radu Timofte
PDF
ReCapture: Generative Video Camera Controls for User-Provided Videos Using Masked Video Fine-Tuning David Junhao Zhang, Roni Paiss, Shiran Zada, Nikhil Karnad, David E. Jacobs, Yael Pritch, Inbar Mosseri, Mike Zheng Shou, Neal Wadhwa, Nataniel Ruiz
PDF
Recognition-Synergistic Scene Text Editing Zhengyao Fang, Pengyuan Lyu, Jingjing Wu, Chengquan Zhang, Jun Yu, Guangming Lu, Wenjie Pei
PDF
ReCon: Enhancing True Correspondence Discrimination Through Relation Consistency for Robust Noisy Correspondence Learning Quanxing Zha, Xin Liu, Shu-Juan Peng, Yiu-ming Cheung, Xing Xu, Nannan Wang
PDF
Reconciling Stochastic and Deterministic Strategies for Zero-Shot Image Restoration Using Diffusion Model in Dual Chong Wang, Lanqing Guo, Zixuan Fu, Siyuan Yang, Hao Cheng, Alex C. Kot, Bihan Wen
PDF
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration Chaojun Ni, Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Wenkang Qin, Guan Huang, Chen Liu, Yuyin Chen, Yida Wang, Xueyang Zhang, Yifei Zhan, Kun Zhan, Peng Jia, Xianpeng Lang, Xingang Wang, Wenjun Mei
PDF
Reconstructing Animals and the Wild Peter Kulits, Michael J. Black, Silvia Zuffi
PDF
Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning Buzhen Huang, Chen Li, Chongyang Xu, Dongyue Lu, Jinnan Chen, Yangang Wang, Gim Hee Lee
PDF
Reconstructing Humans with a Biomechanically Accurate Skeleton Yan Xia, Xiaowei Zhou, Etienne Vouga, Qixing Huang, Georgios Pavlakos
PDF
Reconstructing In-the-Wild Open-Vocabulary Human-Object Interactions Boran Wen, Dingbang Huang, Zichen Zhang, Jiahong Zhou, Jianbin Deng, Jingyu Gong, Yulong Chen, Lizhuang Ma, Yong-Lu Li
PDF
Reconstructing People, Places, and Cameras Lea Müller, Hongsuk Choi, Anthony Zhang, Brent Yi, Jitendra Malik, Angjoo Kanazawa
PDF
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models Jingfeng Yao, Bin Yang, Xinggang Wang
PDF
Recover and Match: Open-Vocabulary Multi-Label Recognition Through Knowledge-Constrained Optimal Transport Hao Tan, Zichang Tan, Jun Li, Ajian Liu, Jun Wan, Zhen Lei
PDF
Recovering Dynamic 3D Sketches from Videos Jaeah Lee, Changwoon Choi, Young Min Kim, Jaesik Park
PDF
Rectification-Specific Supervision and Constrained Estimator for Online Stereo Rectification Rui Gong, Kim-Hui Yap, Weide Liu, Xulei Yang, Jun Cheng
PDF
Rectified Diffusion Guidance for Conditional Generation Mengfei Xia, Nan Xue, Yujun Shen, Ran Yi, Tieliang Gong, Yong-Jin Liu
PDF
Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval Davide Caffagni, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
PDF
Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation Junjie Chen, Weilong Chen, Yifan Zuo, Yuming Fang
PDF
Redefining <Creative> in Dictionary: Towards an Enhanced Semantic Understanding of Creative Generation Fu Feng, Yucheng Xie, Xu Yang, Jing Wang, Xin Geng
PDF
ReDiffDet: Rotation-Equivariant Diffusion Model for Oriented Object Detection Jiaqi Zhao, Zeyu Ding, Yong Zhou, Hancheng Zhu, Wen-Liang Du, Rui Yao
PDF
Reducing Class-Wise Confusion for Incremental Learning with Disentangled Manifolds Huitong Chen, Yu Wang, Yan Fan, Guosong Jiang, Qinghua Hu
PDF
Ref-GS: Directional Factorization for 2D Gaussian Splatting Youjia Zhang, Anpei Chen, Yumin Wan, Zikai Song, Junqing Yu, Yawei Luo, Wei Yang
PDF
Reference-Based 3D-Aware Image Editing with Triplanes Bahri Batuhan Bilecen, Yigit Yalin, Ning Yu, Aysegul Dundar
PDF
RefPose: Leveraging Reference Geometric Correspondences for Accurate 6d Pose Estimation of Unseen Objects Jaeguk Kim, Jaewoo Park, Keuntek Lee, Nam Ik Cho
PDF
Relation-Rich Visual Document Generator for Visual Information Extraction Zi-Han Jiang, Chien-Wei Lin, Wei-Hua Li, Hsuan-Tung Liu, Yi-Ren Yeh, Chu-Song Chen
PDF
Relation3D : Enhancing Relation Modeling for Point Cloud Instance Segmentation Jiahao Lu, Jiacheng Deng
PDF
RelationField: Relate Anything in Radiance Fields Sebastian Koch, Johanna Wald, Mirco Colosi, Narunas Vaskevicius, Pedro Hermosilla, Federico Tombari, Timo Ropinski
PDF
Relative Pose Estimation Through Affine Corrections of Monocular Depth Priors Yifan Yu, Shaohui Liu, Rémi Pautrat, Marc Pollefeys, Viktor Larsson
PDF
Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization Siyan Dong, Shuzhe Wang, Shaohui Liu, Lulu Cai, Qingnan Fan, Juho Kannala, Yanchao Yang
PDF
RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations Savya Khosla, T V Sethuraman, Alexander Schwing, Derek Hoiem
PDF
Remote Photoplethysmography in Real-World and Extreme Lighting Scenarios Hang Shao, Lei Luo, Jianjun Qian, Mengkai Yan, Shuo Chen, Jian Yang
PDF
Removing Reflections from RAW Photos Eric Kee, Adam Pikielny, Kevin Blackburn-Matzen, Marc Levoy
PDF
ReNeg: Learning Negative Embedding with Reward Guidance Xiaomin Li, Yixuan Liu, Takashi Isobe, Xu Jia, Qinpeng Cui, Dong Zhou, Dong Li, You He, Huchuan Lu, Zhongdao Wang, Emad Barsoum
PDF
RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds Kang You, Tong Chen, Dandan Ding, M. Salman Asif, Zhan Ma
PDF
RePerformer: Immersive Human-Centric Volumetric Videos from Playback to Photoreal Reperformance Yuheng Jiang, Zhehao Shen, Chengcheng Guo, Yu Hong, Zhuo Su, Yingliang Zhang, Marc Habermann, Lan Xu
PDF
Reproducible Vision-Language Models Meet Concepts Out of Pre-Training Ziliang Chen, Xin Huang, Xiaoxuan Fan, Keze Wang, Yuyu Zhou, Quanlong Guan, Liang Lin
PDF
Repurposing Pre-Trained Video Diffusion Models for Event-Based Video Interpolation Jingxi Chen, Brandon Y. Feng, Haoming Cai, Tianfu Wang, Levi Burner, Dehao Yuan, Cornelia Fermuller, Christopher A. Metzler, Yiannis Aloimonos
PDF
Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation Markus Karmann, Onay Urfalioglu
PDF
ReRAW: RGB-to-RAW Image Reconstruction via Stratified Sampling for Efficient Object Detection on the Edge Radu Berdan, Beril Besbinar, Christoph Reinders, Junji Otsuka, Daisuke Iso
PDF
ResCLIP: Residual Attention for Training-Free Dense Vision-Language Inference Yuhang Yang, Jinhong Deng, Wen Li, Lixin Duan
PDF
Resilient Sensor Fusion Under Adverse Sensor Failures via Multi-Modal Expert Fusion Konyul Park, Yecheol Kim, Daehun Kim, Jun Won Choi
PDF
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams Chris Dongjoo Kim, Jihwan Moon, Sangwoo Moon, Heeseung Yun, Sihaeng Lee, Aniruddha Kembhavi, Soonyoung Lee, Gunhee Kim, Sangho Lee, Christopher Clark
PDF
RestorGS: Depth-Aware Gaussian Splatting for Efficient 3D Scene Restoration Yuanjian Qiao, Mingwen Shao, Lingzhuang Meng, Kai Xu
PDF
Retaining Knowledge and Enhancing Long-Text Representations in CLIP Through Dual-Teacher Distillation Yuheng Feng, Changsong Wen, Zelin Peng, Li Jiaye, Siyu Zhu
PDF
Rethinking Correspondence-Based Category-Level Object Pose Estimation Huan Ren, Wenfei Yang, Shifeng Zhang, Tianzhu Zhang
PDF
Rethinking Decoder Design: Improving Biomarker Segmentation Using Depth-to-Space Restoration and Residual Linear Attention Saad Wazir, Daeyoung Kim
PDF
Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression Zichong Meng, Yiming Xie, Xiaogang Peng, Zeyu Han, Huaizu Jiang
PDF
Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting Runsong Zhu, Shi Qiu, Zhengzhe Liu, Ka-Hei Hui, Qianyi Wu, Pheng-Ann Heng, Chi-Wing Fu
PDF
Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach Chen-Chen Zong, Sheng-Jun Huang
PDF
Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages Matteo Farina, Massimiliano Mancini, Giovanni Iacca, Elisa Ricci
PDF
Rethinking Lanes and Points in Complex Scenarios for Monocular 3D Lane Detection Yifan Chang, Junjie Huang, Xiaofeng Wang, Yun Ye, Zhujin Liang, Yi Shan, Dalong Du, Xingang Wang
PDF
Rethinking Noisy Video-Text Retrieval via Relation-Aware Alignment Huakai Lai, Guoxin Xiong, Huayu Mai, Xiang Liu, Tianzhu Zhang
PDF
Rethinking Personalized Aesthetics Assessment: Employing Physique Aesthetics Assessment as an Exemplification Haobin Zhong, Shuai He, Anlong Ming, Huadong Ma
PDF
Rethinking Query-Based Transformer for Continual Image Segmentation Yuchen Zhu, Cheng Shi, Dingyou Wang, Jiajin Tang, Zhengxuan Wei, Yu Wu, Guanbin Li, Sibei Yang
PDF
Rethinking Reconstruction and Denoising in the Dark: New Perspective, General Architecture and Beyond Tengyu Ma, Long Ma, Ziye Li, Yuetong Wang, Jinyuan Liu, Chengpei Xu, Risheng Liu
PDF
Rethinking Spiking Self-Attention Mechanism: Implementing A-XNOR Similarity Calculation in Spiking Transformers Yichen Xiao, Shuai Wang, Dehao Zhang, Wenjie Wei, Yimeng Shan, Xiaoli Liu, Yulin Jiang, Malu Zhang
PDF
Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction Dubing Chen, Huan Zheng, Jin Fang, Xingping Dong, Xianfei Li, Wenlong Liao, Tao He, Pai Peng, Jianbing Shen
PDF
Rethinking the Adversarial Robustness of Multi-Exit Neural Networks in an Attack-Defense Game Keyizhi Xu, Chi Zhang, Zhan Chen, Zhongyuan Wang, Chunxia Xiao, Chao Liang
PDF
Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks Cheng Lei, Ao Li, Hu Yao, Ce Zhu, Le Zhang
PDF
Rethinking Training for De-Biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion Eunji Kim, Siwon Kim, Minjun Park, Rahim Entezari, Sungroh Yoon
PDF
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector Xiao Guo, Xiufeng Song, Yue Zhang, Xiaohong Liu, Xiaoming Liu
PDF
Retrieving Semantics from the Deep: An RAG Solution for Gesture Synthesis M. Hamza Mughal, Rishabh Dabral, Merel C.J. Scholman, Vera Demberg, Christian Theobalt
PDF
Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-Based Action Recognition Hongda Liu, Yunfan Liu, Min Ren, Hao Wang, Yunlong Wang, Zhenan Sun
PDF
Reversible Decoupling Network for Single Image Reflection Removal Hao Zhao, Mingjia Li, Qiming Hu, Xiaojie Guo
PDF
Reversing Flow for Image Restoration Haina Qin, Wenyang Luo, Libin Wang, Dandan Zheng, Jingdong Chen, Ming Yang, Bing Li, Weiming Hu
PDF
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu, Thomas Seidl, Gedas Bertasius
PDF
Revisiting Audio-Visual Segmentation with Vision-Centric Transformer Shaofei Huang, Rui Ling, Tianrui Hui, Hongyu Li, Xu Zhou, Shifeng Zhang, Si Liu, Richang Hong, Meng Wang
PDF
Revisiting Backdoor Attacks Against Large Vision-Language Models from Domain Shift Siyuan Liang, Jiawei Liang, Tianyu Pang, Chao Du, Aishan Liu, Mingli Zhu, Xiaochun Cao, Dacheng Tao
PDF
Revisiting Fairness in Multitask Learning: A Performance-Driven Approach for Variance Reduction Xiaohan Qin, Xiaoxing Wang, Junchi Yan
PDF
Revisiting Generative Replay for Class Incremental Object Detection Shizhou Zhang, Xueqiang Lv, Yinghui Xing, Qirui Wu, Di Xu, Yanning Zhang
PDF
Revisiting MAE Pre-Training for 3D Medical Image Segmentation Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko, Andrei Goncharov, Alberto Paderno, Maximilian Miller, Leander Maerkisch, Paul Jaeger, Klaus Maier-Hein
PDF
Revisiting Source-Free Domain Adaptation: Insights into Representativeness, Generalization, and Variety Ronghang Zhu, Mengxuan Hu, Weiming Zhuang, Lingjuan Lyu, Xiang Yu, Sheng Li
PDF
Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward Zhiwei Jia, Yuesong Nan, Huixi Zhao, Gengdai Liu
PDF
REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning Jihyun Lee, Weipeng Xu, Alexander Richard, Shih-En Wei, Shunsuke Saito, Shaojie Bai, Te-Li Wang, Minhyuk Sung, Tae-Kyun Kim, Jason Saragih
PDF
ReWind: Understanding Long Videos with Instructed Learnable Memory Anxhelo Diko, Tinghuai Wang, Wassim Swaileh, Shiyan Sun, Ioannis Patras
PDF
RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars Linzhou Li, Yumeng Li, Yanlin Weng, Youyi Zheng, Kun Zhou
PDF
RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection Yunfei Long, Abhinav Kumar, Xiaoming Liu, Daniel Morris
PDF
RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos Yuxin Yao, Zhi Deng, Junhui Hou
PDF
RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety Andrei Dumitriu, Florin Tatui, Florin Miron, Aakash Ralhan, Radu Tudor Ionescu, Radu Timofte
PDF
RivuletMLP: An MLP-Based Architecture for Efficient Compressed Video Quality Enhancement Gang He, Weiran Wang, Guancheng Quan, Shihao Wang, Dajiang Zhou, Yunsong Li
PDF
RL-RC-DoT: A Block-Level RL Agent for Task-Aware Video Compression Uri Gadot, Assaf Shocher, Shie Mannor, Gal Chechik, Assaf Hallak
PDF
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness Tianyu Yu, Haoye Zhang, Qiming Li, Qixin Xu, Yuan Yao, Da Chen, Xiaoman Lu, Ganqu Cui, Yunkai Dang, Taiwen He, Xiaocheng Feng, Jun Song, Bo Zheng, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun
PDF
RNG: Relightable Neural Gaussians Jiahui Fan, Fujun Luan, Jian Yang, Milos Hasan, Beibei Wang
PDF
RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives Chirag Parikh, Deepti Rawat, R. T. Rakshitha, Tathagata Ghosh, Ravi Kiran Sarvadevabhatla
PDF
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete Yuheng Ji, Huajie Tan, Jiayu Shi, Xiaoshuai Hao, Yuan Zhang, Hengyuan Zhang, Pengwei Wang, Mengdi Zhao, Yao Mu, Pengju An, Xinda Xue, Qinghang Su, Huaihai Lyu, Xiaolong Zheng, Jiaming Liu, Zhongyuan Wang, Shanghang Zhang
PDF
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors Haifeng Huang, Xinyi Chen, Yilun Chen, Hao Li, Xiaoshen Han, Zehan Wang, Tai Wang, Jiangmiao Pang, Zhou Zhao
PDF
RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation Through Embedding Predictive Pre-Training Raktim Gautam Goswami, Prashanth Krishnamurthy, Yann LeCun, Farshad Khorrami
PDF
RoboSense: Large-Scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments Haisheng Su, Feixiang Song, Cong Ma, Wei Wu, Junchi Yan
PDF
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics Chan Hee Song, Valts Blukis, Jonathan Tremblay, Stephen Tyree, Yu Su, Stan Birchfield
PDF
Robotic Visual Instruction Yanbang Li, Ziyang Gong, Haoyang Li, Xiaoqi Huang, Haolan Kang, Guangping Bai, Xianzheng Ma
PDF
RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins Yao Mu, Tianxing Chen, Zanxin Chen, Shijia Peng, Zhiqian Lan, Zeyu Gao, Zhixuan Liang, Qiaojun Yu, Yude Zou, Mingkun Xu, Lunkai Lin, Zhiqiang Xie, Mingyu Ding, Ping Luo
PDF
RobSense: A Robust Multi-Modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability Minh Kha Do, Kang Han, Phu Lai, Khoa T. Phan, Wei Xiang
PDF
Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild Junhyeong Cho, Kim Youwang, Hunmin Yang, Tae-Hyun Oh
PDF
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment Chen Liu, Peike Li, Liying Yang, Dadong Wang, Lincheng Li, Xin Yu
PDF
Robust Message Embedding via Attention Flow-Based Steganography Huayuan Ye, Shenzhuo Zhang, Shiqi Jiang, Jing Liao, Shuhang Gu, Dejun Zheng, Changbo Wang, Chenhui Li
PDF
Robust Multi-Object 4D Generation for In-the-Wild Videos Wen-Hsuan Chu, Lei Ke, Jianmeng Liu, Mingxiao Huo, Pavel Tokmakov, Katerina Fragkiadaki
PDF
Robust Multimodal Survival Prediction with Conditional Latent Differentiation Variational AutoEncoder Junjie Zhou, Jiao Tang, Yingli Zuo, Peng Wan, Daoqiang Zhang, Wei Shao
PDF
Robust-MVTON: Learning Cross-Pose Feature Alignment and Fusion for Robust Multi-View Virtual Try-on Nannan Zhang, Yijiang Li, Dong Du, Zheng Chong, Zhengwentai Sun, Jianhao Zeng, Yusheng Dai, Zhengyu Xie, Hairui Zhu, Xiaoguang Han
PDF
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting Shaofei Cai, Zihao Wang, Kewei Lian, Zhancun Mu, Xiaojian Ma, Anji Liu, Yitao Liang
PDF
ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models Heng Yin, Yuqiang Ren, Ke Yan, Shouhong Ding, Yongtao Hao
PDF
RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images Junjin Xiao, Qing Zhang, Yonewei Nie, Lei Zhu, Wei-Shi Zheng
PDF
ROICtrl: Boosting Instance Control for Visual Generation Yuchao Gu, Yipin Zhou, Yunfan Ye, Yixin Nie, Licheng Yu, Pingchuan Ma, Kevin Qinghong Lin, Mike Zheng Shou
PDF
ROLL: Robust Noisy Pseudo-Label Learning for Multi-View Clustering with Noisy Correspondence Yuan Sun, Yongxiang Li, Zhenwen Ren, Guiduo Duan, Dezhong Peng, Peng Hu
PDF
RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing Zhipeng Huang, Wangbo Yu, Xinhua Cheng, Chengshu Zhao, Yunyang Ge, Mingyi Guo, Li Yuan, Yonghong Tian
PDF
RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation Mingfei Han, Liang Ma, Kamila Zhumakhanova, Ekaterina Radionova, Jingyi Zhang, Xiaojun Chang, Xiaodan Liang, Ivan Laptev
PDF
RORem: Training a Robust Object Remover with Human-in-the-Loop Ruibin Li, Tao Yang, Song Guo, Lei Zhang
PDF
ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object Zhe Shan, Yang Liu, Lei Zhou, Cheng Yan, Heng Wang, Xia Xie
PDF
Rotation-Equivariant Self-Supervised Method in Image Denoising Hanze Liu, Jiahong Fu, Qi Xie, Deyu Meng
PDF
RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark Xin Zhang, Xue Yang, Yuxuan Li, Jian Yang, Ming-Ming Cheng, Xiang Li
PDF
RUBIK: A Structured Benchmark for Image Matching Across Geometric Challenges Thibaut Loiseau, Guillaume Bourmaud
PDF
S^3-Face: SSS-Compliant Facial Reflectance Estimation via Diffusion Priors Xingyu Ren, Jiankang Deng, Yuhao Cheng, Wenhan Zhu, Yichao Yan, Xiaokang Yang, Stefanos Zafeiriou, Chao Ma
PDF
S2D-LFE: Sparse-to-Dense Light Field Event Generation Yutong Liu, Wenming Weng, Yueyi Zhang, Zhiwei Xiong
PDF
S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting Yecong Wan, Mingwen Shao, Yuanshuo Cheng, Wangmeng Zuo
PDF
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation Yichen Xie, Runsheng Xu, Tong He, Jyh-Jing Hwang, Katie Luo, Jingwei Ji, Hubert Lin, Letian Chen, Yiren Lu, Zhaoqi Leng, Dragomir Anguelov, Mingxing Tan
PDF
SACB-Net: Spatial-Awareness Convolutions for Medical Image Registration Xinxing Cheng, Tianyang Zhang, Wenqi Lu, Qingjie Meng, Alejandro F. Frangi, Jinming Duan
PDF
SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining Mingjin Zhang, Xiaolong Li, Fei Gao, Jie Guo, Xinbo Gao, Jing Zhang
PDF
SALAD: Skeleton-Aware Latent Diffusion for Text-Driven Motion Generation and Editing Seokhyeon Hong, Chaelin Kim, Serin Yoon, Junghyun Nam, Sihun Cha, Junyong Noh
PDF
Saliuitl: Ensemble Salience Guided Recovery of Adversarial Patches Against CNNs Mauricio Byrd Victorica, György Dán, Henrik Sandberg
PDF
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis Junho Kim, Hyunjun Kim, Hosu Lee, Yong Man Ro
PDF
SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost Haiyang Mei, Pengyu Zhang, Mike Zheng Shou
PDF
SAM-REF: Introducing Image-Prompt Synergy During Interaction for Detail Enhancement in the Segment Anything Model Chongkai Yu, Ting Liu, Anqi Li, Xiaochao Qu, Chengjing Wu, Luoqi Liu, Xiaolin Hu
PDF
SAM2-LOVE: Segment Anything Model 2 in Language-Aided Audio-Visual Scenes Yuji Wang, Haoran Xu, Yong Liu, Jiaze Li, Yansong Tang
PDF
SAM2Object: Consolidating View Consistency via SAM2 for Zero-Shot 3D Instance Segmentation Jihuai Zhao, Junbao Zhuo, Jiansheng Chen, Huimin Ma
PDF
SaMam: Style-Aware State Space Model for Arbitrary Image Style Transfer Hongda Liu, Longguang Wang, Ye Zhang, Ziru Yu, Yulan Guo
PDF
Samba: A Unified Mamba-Based Framework for General Salient Object Detection Jiahao He, Keren Fu, Xiaohong Liu, Qijun Zhao
PDF
SAMBLE: Shape-Specific Point Cloud Sampling for an Optimal Trade-Off Between Local Detail and Global Uniformity Chengzhi Wu, Yuxin Wan, Hao Fu, Julius Pfrommer, Zeyun Zhong, Junwei Zheng, Jiaming Zhang, Jürgen Beyerer
PDF
Sample- and Parameter-Efficient Auto-Regressive Image Models Elad Amrani, Leonid Karlinsky, Alex Bronstein
PDF
Sampling Innovation-Based Adaptive Compressive Sensing Zhifu Tian, Tao Hu, Chaoyang Niu, Di Wu, Shu Wang
PDF
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation Claudia Cuttano, Gabriele Trivigno, Gabriele Rosi, Carlo Masone, Giuseppe Averta
PDF
SapiensID: Foundation for Human Recognition Minchul Kim, Dingqiang Ye, Yiyang Su, Feng Liu, Xiaoming Liu
PDF
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-Scale 3D VQVAE Yongwei Chen, Yushi Lan, Shangchen Zhou, Tengfei Wang, Xingang Pan
PDF
SASep: Saliency-Aware Structured Separation of Geometry and Feature for Open Set Learning on Point Clouds Jinfeng Xu, Xianzhi Li, Yuan Tang, Xu Han, Qiao Yu, Yixue Hao, Long Hu, Min Chen
PDF
SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens Chi Su, Xiaoxuan Ma, Jiajun Su, Yizhou Wang
PDF
SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers Nick Nikzad, Yi Liao, Yongsheng Gao, Jun Zhou
PDF
Satellite Observations Guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution Siwei Tu, Ben Fei, Weidong Yang, Fenghua Ling, Hao Chen, Zili Liu, Kun Chen, Hang Fan, Wanli Ouyang, Lei Bai
PDF
Satellite to GroundScape - Large-Scale Consistent Ground View Generation from Satellite Views Ningli Xu, Rongjun Qin
PDF
Scalable Autoregressive Monocular Depth Estimation Jinhong Wang, Jian Liu, Dongqi Tang, Weiqiang Wang, Wentong Li, Danny Chen, Jintai Chen, Jian Wu
PDF
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents Yunseok Jang, Yeda Song, Sungryull Sohn, Lajanugen Logeswaran, Tiange Luo, Dong-Ki Kim, Kyunghoon Bae, Honglak Lee
PDF
Scale Efficient Training for Large Datasets Qing Zhou, Junyu Gao, Qi Wang
PDF
ScaleLSD: Scalable Deep Line Segment Detection Streamlined Zeran Ke, Bin Tan, Xianwei Zheng, Yujun Shen, Tianfu Wu, Nan Xue
PDF
Scaling Down Text Encoders of Text-to-Image Diffusion Models Lifu Wang, Daqing Liu, Xinchen Liu, Xiaodong He
PDF
Scaling Inference Time Compute for Diffusion Models Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, Saining Xie
PDF
Scaling Mesh Generation via Compressive Tokenization Haohan Weng, Zibo Zhao, Biwen Lei, Xianghui Yang, Jian Liu, Zeqiang Lai, Zhuo Chen, Yuhong Liu, Jie Jiang, Chunchao Guo, Tong Zhang, Shenghua Gao, C.L. Philip Chen
PDF
Scaling Properties of Diffusion Models for Perceptual Tasks Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik
PDF
Scaling up Image Segmentation Across Data and Tasks Pei Wang, Zhaowei Cai, Hao Yang, Ashwin Swaminathan, R. Manmatha, Stefano Soatto
PDF
Scaling Vision Pre-Training to 4k Resolution Baifeng Shi, Boyi Li, Han Cai, Yao Lu, Sifei Liu, Marco Pavone, Jan Kautz, Song Han, Trevor Darrell, Pavlo Molchanov, Hongxu Yin
PDF
ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model Shunlin Lu, Jingbo Wang, Zeyu Lu, Ling-Hao Chen, Wenxun Dai, Junting Dong, Zhiyang Dou, Bo Dai, Ruimao Zhang
PDF
SCAP: Transductive Test-Time Adaptation via Supportive Clique-Based Attribute Prompting Chenyu Zhang, Kunlun Xu, Zichen Liu, Yuxin Peng, Jiahuan Zhou
PDF
Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments Luke Rowe, Roger Girgis, Anthony Gosselin, Liam Paull, Christopher Pal, Felix Heide
PDF
Scene mAP-Based Prompt Tuning for Navigation Instruction Generation Sheng Fan, Rui Liu, Wenguan Wang, Yi Yang
PDF
Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model Shengjun Zhang, Jinzhao Li, Xin Fei, Hao Liu, Yueqi Duan
PDF
Scene-Agnostic Pose Regression for Visual Localization Junwei Zheng, Ruiping Liu, Yufan Chen, Zhenfang Chen, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen
PDF
Scene-Centric Unsupervised Panoptic Segmentation Oliver Hahn, Christoph Reich, Nikita Araslanov, Daniel Cremers, Christian Rupprecht, Stefan Roth
PDF
Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration Zilong Huang, Jun He, Junyan Ye, Lihan Jiang, Weijia Li, Yiping Chen, Ting Han
PDF
SceneCrafter: Controllable Multi-View Driving Scene Editing Zehao Zhu, Yuliang Zou, Chiyu Max Jiang, Bo Sun, Vincent Casser, Xiukun Huang, Jiahao Wang, Zhenpei Yang, Ruiqi Gao, Leonidas Guibas, Mingxing Tan, Dragomir Anguelov
PDF
SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model Shuhan Tan, John Lambert, Hong Jeon, Sakshum Kulshrestha, Yijing Bai, Jing Luo, Dragomir Anguelov, Mingxing Tan, Chiyu Max Jiang
PDF
SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation Aleksey Bokhovkin, Quan Meng, Shubham Tulsiani, Angela Dai
PDF
SceneTAP: Scene-Coherent Typographic Adversarial Planner Against Vision-Language Models in Real-World Environments Yue Cao, Yun Xing, Jie Zhang, Di Lin, Tianwei Zhang, Ivor Tsang, Yang Liu, Qing Guo
PDF
SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow Qingyuan Wang, Rui Song, Jiaojiao Li, Kerui Cheng, David Ferstl, Yinlin Hu
PDF
Schedule on the Fly: Diffusion Time Prediction for Faster and Better Image Generation Zilyu Ye, Zhiyang Chen, Tiancheng Li, Zemin Huang, Weijian Luo, Guo-Jun Qi
PDF
Science-T2I: Addressing Scientific Illusions in Image Synthesis Jialuo Li, Wenhao Chai, Xingyu Fu, Haiyang Xu, Saining Xie
PDF
ScribbleLight: Single Image Indoor Relighting with Scribbles Jun Myeong Choi, Annie Wang, Pieter Peers, Anand Bhattad, Roni Sengupta
PDF
SCSA: A Plug-and-Play Semantic Continuous-Sparse Attention for Arbitrary Semantic Style Transfer Chunnan Shang, Zhizhong Wang, Hongwei Wang, Xiangming Meng
PDF
SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures Hui Liu, Chen Jia, Fan Shi, Xu Cheng, Shengyong Chen
PDF
SDBF: Steep-Decision-Boundary Fingerprinting for Hard-Label Tampering Detection of DNN Models Xiaofan Bai, Shixin Li, Xiaojing Ma, Bin Benjamin Zhu, Dongmei Zhang, Linchen Yu
PDF
SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction ZaiPeng Duan, ChenXu Dang, Xuzhong Hu, Pei An, Junfeng Ding, Jie Zhan, YunBiao Xu, Jie Ma
PDF
Sea-Ing in Low-Light Nisha Varghese, A. N. Rajagopalan
PDF
SEAL: Semantic Attention Learning for Long Video Representation Lan Wang, Yujia Chen, Du Tran, Vishnu Naresh Boddeti, Wen-Sheng Chu
PDF
SeaLion: Semantic Part-Aware Latent Point Diffusion Models for 3D Generation Dekai Zhu, Yan Di, Stefan Gavranovic, Slobodan Ilic
PDF
Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval Mankeerat Sidhu, Hetarth Chopra, Ansel Blume, Jeonghwan Kim, Revanth Gangi Reddy, Heng Ji
PDF
SEC-Prompt:SEmantic Complementary Prompting for Few-Shot Class-Incremental Learning Ye Liu, Meng Yang
PDF
SeCap: Self-Calibrating and Adaptive Prompts for Cross-View Person Re-Identification in Aerial-Ground Networks Shining Wang, Yunlong Wang, Ruiqi Wu, Bingliang Jiao, Wenxuan Wang, Peng Wang
PDF
Secret Lies in Color: Enhancing AI-Generated Images Detection with Color Distribution Analysis Zexi Jia, Chuanwei Huang, Yeshuang Zhu, Hongyan Fei, Xiaoyue Duan, Zhiqiang Yuan, Ying Deng, Jiapei Zhang, Jinchao Zhang, Jie Zhou
PDF
See Further When Clear: Curriculum Consistency Model Yunpeng Liu, Boxiao Liu, Yi Zhang, Xingzhong Hou, Guanglu Song, Yu Liu, Haihang You
PDF
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration Jianyi Wang, Zhijie Lin, Meng Wei, Yang Zhao, Ceyuan Yang, Chen Change Loy, Lu Jiang
PDF
SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding Rong Li, Shijie Li, Lingdong Kong, Xulei Yang, Junwei Liang
PDF
Seeing a 3D World in a Grain of Sand Yufan Zhang, Yu Ji, Yu Guo, Jinwei Ye
PDF
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding Feilong Tang, Chengzhi Liu, Zhongxing Xu, Ming Hu, Zile Huang, Haochen Xue, Ziyang Chen, Zelin Peng, Zhiwei Yang, Sijin Zhou, Wenxue Li, Yulong Li, Wenxuan Song, Shiyan Su, Wei Feng, Jionglong Su, Mingquan Lin, Yifan Peng, Xuelian Cheng, Imran Razzak, Zongyuan Ge
PDF
Seeing Is Not Believing: Adversarial Natural Object Optimization for Hard-Label 3D Scene Attacks Daizong Liu, Wei Hu
PDF
Seeing More with Less: Human-like Representations in Vision Models Andrey Gizdov, Shimon Ullman, Daniel Harari
PDF
Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes Hyeonggon Ryu, Seongyu Kim, Joon Son Chung, Arda Senocak
PDF
Seeing the Abstract: Translating the Abstract Language for Vision Language Models Davide Talon, Federico Girella, Ziyue Liu, Marco Cristani, Yiming Wang
PDF
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection Gensheng Pei, Tao Chen, Yujia Wang, Xinhao Cai, Xiangbo Shu, Tianfei Zhou, Yazhou Yao
PDF
Seek Common Ground While Reserving Differences: Semi-Supervised Image-Text Sentiment Recognition Wuyou Xia, Guoli Jia, Sicheng Zhao, Jufeng Yang
PDF
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes Aodi Li, Liansheng Zhuang, Xiao Long, Minghong Yao, Shafei Wang
PDF
SEEN-DA: SEmantic ENtropy Guided Domain-Aware Attention for Domain Adaptive Object Detection Haochen Li, Rui Zhang, Hantao Yao, Xin Zhang, Yifan Hao, Xinkai Song, Shaohui Peng, Yongwei Zhao, Chen Zhao, Yanjun Wu, Ling Li
PDF
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories Muzhi Zhu, Yuzhuo Tian, Hao Chen, Chunluan Zhou, Qingpei Guo, Yang Liu, Ming Yang, Chunhua Shen
PDF
SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images Kaiyu Li, Ruixun Liu, Xiangyong Cao, Xueru Bai, Feng Zhou, Deyu Meng, Zhi Wang
PDF
SegMAN: Omni-Scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation Yunxiang Fu, Meng Lou, Yizhou Yu
PDF
Segment Any Motion in Videos Nan Huang, Wenzhao Zheng, Chenfeng Xu, Kurt Keutzer, Shanghang Zhang, Angjoo Kanazawa, Qianqian Wang
PDF
Segment Any-Quality Images with Generative Latent Space Enhancement Guangqian Guo, Yong Guo, Xuehui Yu, Wenbo Li, Yaoxing Wang, Shan Gao
PDF
Segment Anything, Even Occluded Wei-En Tai, Yu-Lin Shih, Cheng Sun, Yu-Chiang Frank Wang, Hwann-Tzong Chen
PDF
Segment This Thing: Foveated Tokenization for Efficient Point-Prompted Segmentation Tanner Schmidt, Richard Newcombe
PDF
Segmenting Maxillofacial Structures in CBCT Volumes Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij, Luca Lumetti, Vittorio Pipoli, Elisa Ficarra, Shankeeth Vinayahalingam, Costantino Grana
PDF
Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects Weimin Qiu, Jieke Wang, Meng Tang
PDF
Self-Evolving Visual Concept Library Using Vision-Language Critics Atharva Sehgal, Patrick Yuan, Ziniu Hu, Yisong Yue, Jennifer J. Sun, Swarat Chaudhuri
PDF
Self-Expansion of Pre-Trained Models with Mixture of Adapters for Continual Learning Huiyi Wang, Haodong Lu, Lina Yao, Dong Gong
PDF
Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion Model Jian Zhu, He Wang, Yang Xu, Zebin Wu, Zhihui Wei
PDF
Self-Supervised ControlNet with Spatio-Temporal Mamba for Real-World Video Super-Resolution Shijun Shi, Jing Xu, Lijing Lu, Zhihang Li, Kai Hu
PDF
Self-Supervised Cross-View Correspondence with Predictive Cycle Consistency Alan Baade, Changan Chen
PDF
Self-Supervised Large Scale Point Cloud Completion for Archaeological Site Restoration Aocheng Li, James R. Zimmer-Dauphinee, Rajesh Kalyanam, Ian Lindsay, Parker VanValkenburgh, Steven Wernke, Daniel Aliaga
PDF
Self-Supervised Learning for Color Spike Camera Reconstruction Yanchen Dong, Ruiqin Xiong, Xiaopeng Fan, Zhaofei Yu, Yonghong Tian, Tiejun Huang
PDF
Self-Supervised Spatial Correspondence Across Modalities Ayush Shrivastava, Andrew Owens
PDF
SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting Gyeongjin Kang, Jisang Yoo, Jihyeon Park, Seungtae Nam, Hyeonsoo Im, Sangheon Shin, Sangpil Kim, Eunbyung Park
PDF
SemAlign3D: Semantic Correspondence Between RGB-Images Through Aligning 3D Object-Class Representations Krispin Wandel, Hesheng Wang
PDF
Semantic and Expressive Variations in Image Captions Across Languages Andre Ye, Sebastin Santy, Jena D. Hwang, Amy X. Zhang, Ranjay Krishna
PDF
Semantic and Sequential Alignment for Referring Video Object Segmentation Feiyu Pan, Hao Fang, Fangkai Li, Yanyu Xu, Yawei Li, Luca Benini, Xiankai Lu
PDF
Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos, Marc Botet Colomer, Linus Härenstam-Nielsen, Mattia Segu, Pier Luigi Dovesi, Jussi Karlgren, Daniel Cremers, Federico Tombari, Matteo Poggi
PDF
Semantic-Guided Cross-Modal Prompt Learning for Skeleton-Based Zero-Shot Action Recognition Anqi Zhu, Jingmin Zhu, James Bailey, Mingming Gong, Qiuhong Ke
PDF
SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models Jaerin Lee, Daniel Sungho Jung, Kanggeon Lee, Kyoung Mu Lee
PDF
SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance Peishan Cong, Ziyi Wang, Yuexin Ma, Xiangyu Yue
PDF
Semi-Supervised State-Space Model with Dynamic Stacking Filter for Real-World Video Deraining Shangquan Sun, Wenqi Ren, Juxiang Zhou, Shu Wang, Jianhou Gan, Xiaochun Cao
PDF
SemiDAViL: Semi-Supervised Domain Adaptation with Vision-Language Guidance for Semantic Segmentation Hritam Basak, Zhaozheng Yin
PDF
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-End Text Spotting Dongliang Luo, Hanshen Zhu, Ziyang Zhang, Dingkang Liang, Xudong Xie, Yuliang Liu, Xiang Bai
PDF
Sensitivity-Aware Efficient Fine-Tuning via Compact Dynamic-Rank Adaptation Tianran Chen, Jiarui Chen, Baoquan Zhang, Zhehao Yu, Shidong Chen, Rui Ye, Xutao Li, Yunming Ye
PDF
Separation of Powers: On Segregating Knowledge from Observation in LLM-Enabled Knowledge-Based Visual Question Answering Zhen Yang, Zhuo Tao, Qi Chen, Liang Li, Yuankai Qi, Anton van den Hengel, Qingming Huang
PDF
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding Andong Deng, Zhongpai Gao, Anwesa Choudhuri, Benjamin Planche, Meng Zheng, Bin Wang, Terrence Chen, Chen Chen, Ziyan Wu
PDF
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model Chunlin Yu, Hanqing Wang, Ye Shi, Haoyang Luo, Sibei Yang, Jingyi Yu, Jingya Wang
PDF
SeqMvRL: A Sequential Fusion Framework for Multi-View Representation Learning Ren Wang, Haoliang Sun, Yuxiu Lin, Chuanhui Zuo, Yongshun Gong, Yilong Yin, Wenjia Meng
PDF
SerialGen: Personalized Image Generation by First Standardization Then Personalization Cong Xie, Han Zou, Ruiqi Yu, Yan Zhang, Zhenpeng Zhan
PDF
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding Chenkai Zhang, Yiming Lei, Zeming Liu, Haitao Leng, ShaoGuo Liu, Tingting Gao, Qingjie Liu, Yunhong Wang
PDF
SET: Spectral Enhancement for Tiny Object Detection Huixin Sun, Runqi Wang, Yanjing Li, Linlin Yang, Shaohui Lin, Xianbin Cao, Baochang Zhang
PDF
Seurat: From Moving Points to Depth Seokju Cho, Jiahui Huang, Seungryong Kim, Joon-Young Lee
PDF
SF2T: Self-Supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding Yangliu Hu, Zikai Song, Na Feng, Yawei Luo, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang
PDF
SF3D: Stable Fast 3D Mesh Reconstruction with UV-Unwrapping and Illumination Disentanglement Mark Boss, Zixuan Huang, Aaryaman Vasishta, Varun Jampani
PDF
SFDM: Robust Decomposition of Geometry and Reflectance for Realistic Face Rendering from Sparse-View Images Daisheng Jin, Jiangbei Hu, Baixin Xu, Yuxin Dai, Chen Qian, Ying He
PDF
SfM-Free 3D Gaussian Splatting via Hierarchical Training Bo Ji, Angela Yao
PDF
SGC-Net: Stratified Granular Comparison Network for Open-Vocabulary HOI Detection Xin Lin, Chong Shi, Zuopeng Yang, Haojin Tang, Zhili Zhou
PDF
SGCR: Spherical Gaussians for Efficient 3D Curve Reconstruction Xinran Yang, Donghao Ji, Yuanqi Li, Jie Guo, Yanwen Guo, Junyuan Xie
PDF
SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion Xiyue Guo, Jiarui Hu, Junjie Hu, Hujun Bao, Guofeng Zhang
PDF
SGSST: Scaling Gaussian Splatting Style Transfer Bruno Galerne, Jianling Wang, Lara Raad, Jean-Michel Morel
PDF
Shading Meets Motion: Self-Supervised Indoor 3D Reconstruction via Simultaneous Shape-from-Shading and Structure-from-Motion Guoyu Lu
PDF
Shadow Generation Using Diffusion Model with Geometry Prior Haonan Zhao, Qingyang Liu, Xinhao Tao, Li Niu, Guangtao Zhai
PDF
Shape Abstraction via Marching Differentiable Support Functions Sunkyung Park, Jeongmin Lee, Dongjun Lee
PDF
Shape and Texture: What Influences Reliable Optical Flow Estimation? Libo Long, Xiao Hu, Jochen Lang
PDF
Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions Ting-Hsuan Liao, Yi Zhou, Yu Shen, Chun-Hao Paul Huang, Saayan Mitra, Jia-Bin Huang, Uttaran Bhattacharya
PDF
ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion Nissim Maruani, Wang Yifan, Matthew Fisher, Pierre Alliez, Mathieu Desbrun
PDF
ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts Dmitry Petrov, Pradyumn Goyal, Divyansh Shivashok, Yuanming Tao, Melinos Averkiou, Evangelos Kalogerakis
PDF
Sharp-It: A Multi-View to Multi-View Diffusion Model for 3D Synthesis and Manipulation Yiftach Edelstein, Or Patashnik, Dana Cohen-Bar, Lihi Zelnik-Manor
PDF
SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation Duc-Hai Pham, Tung Do, Phong Nguyen, Binh-Son Hua, Khoi Nguyen, Rang Nguyen
PDF
Shift the Lens: Environment-Aware Unsupervised Camouflaged Object Detection Ji Du, Fangwei Hao, Mingyang Yu, Desheng Kong, Jiesheng Wu, Bin Wang, Jing Xu, Ping Li
PDF
ShiftwiseConv: Small Convolutional Kernel with Large Kernel Effect Dachong Li, Li Li, Zhuangzhuang Chen, Jianqiang Li
PDF
Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model Yingmao Miao, Zhanpeng Huang, Rui Han, Zibin Wang, Chenhao Lin, Chao Shen
PDF
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models Ozgur Kara, Krishna Kumar Singh, Feng Liu, Duygu Ceylan, James M. Rehg, Tobias Hinz
PDF
Show and Segment: Universal Medical Image Segmentation via In-Context Learning Yunhe Gao, Di Liu, Zhuowei Li, Yunsheng Li, Dongdong Chen, Mu Zhou, Dimitris N. Metaxas
PDF
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models Itay Benou, Tammy Riklin Raviv
PDF
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions Tomáš Souček, Prajwal Gatti, Michael Wray, Ivan Laptev, Dima Damen, Josef Sivic
PDF
ShowMak3r: Compositional TV Show Reconstruction Sangmin Kim, Seunguk Do, Jaesik Park
PDF
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Stan Weixian Lei, Lijuan Wang, Mike Zheng Shou
PDF
SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model Zhenglin Huang, Jinwei Hu, Xiangtai Li, Yiwei He, Xingyu Zhao, Bei Peng, Baoyuan Wu, Xiaowei Huang, Guangliang Cheng
PDF
Silence Is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-Based Talking-Head Generation Yuan Gan, Jiaxu Miao, Yunze Wang, Yi Yang
PDF
Silent Branding Attack: Trigger-Free Data Poisoning Attack on Text-to-Image Diffusion Models Sangwon Jang, June Suk Choi, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang
PDF
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation Leigang Qu, Haochuan Li, Wenjie Wang, Xiang Liu, Juncheng Li, Liqiang Nie, Tat-Seng Chua
PDF
Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations Ahmad Rahimi, Po-Chien Luan, Yuejiang Liu, Frano Rajič, Alexandre Alahi
PDF
SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing Xueting Li, Ye Yuan, Shalini De Mello, Gilles Daviet, Jonathan Leaf, Miles Macklin, Jan Kautz, Umar Iqbal
PDF
Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking Chaocan Xue, Bineng Zhong, Qihua Liang, Yaozong Zheng, Ning Li, Yuanliang Xue, Shuxiang Song
PDF
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment Katrin Renz, Long Chen, Elahe Arani, Oleg Sinavski
PDF
SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection Phi Vu Tran
PDF
SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction Zhengyuan Li, Kai Cheng, Anindita Ghosh, Uttaran Bhattacharya, Liangyan Gui, Aniket Bera
PDF
Simpler Diffusion: 1.5 FID on ImageNet512 with Pixel-Space Diffusion Emiel Hoogeboom, Thomas Mensink, Jonathan Heek, Kay Lamerigts, Ruiqi Gao, Tim Salimans
PDF
Simplification Is All You Need Against Out-of-Distribution Overconfidence Keke Tang, Chao Hou, Weilong Peng, Xiang Fang, Zhize Wu, Yongwei Nie, Wenping Wang, Zhihong Tian
PDF
Simulator HC: Regression-Based Online Simulation of Starting Problem-Solution Pairs for Homotopy Continuation in Geometric Vision Xinyue Zhang, Zijia Dai, Wanting Xu, Laurent Kneip
PDF
SimVS: Simulating World Inconsistencies for Robust View Synthesis Alex Trevithick, Roni Paiss, Philipp Henzler, Dor Verbin, Rundi Wu, Hadi Alzayer, Ruiqi Gao, Ben Poole, Jonathan T. Barron, Aleksander Holynski, Ravi Ramamoorthi, Pratul P. Srinivasan
PDF
Single Domain Generalization for Few-Shot Counting via Universal Representation Matching Xianing Chen, Si Huo, Borui Jiang, Hailin Hu, Xinghao Chen
PDF
SinGS: Animatable Single-Image Human Gaussian Splats with Kinematic Priors Yufan Wu, Xuanhong Chen, Wen Li, Shunran Jia, Hualiang Wei, Kairui Feng, Jialiang Chen, Yuhan Li, Ang He, Weimin Zhang, Bingbing Ni, Wenjun Zhang
PDF
SINR: Sparsity Driven Compressed Implicit Neural Representations Dhananjaya Jayasundara, Sudarshan Rajagopalan, Yasiru Ranasinghe, Trac D. Tran, Vishal M. Patel
PDF
SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model Yucheng Mao, Boyang Wang, Nilesh Kulkarni, Jeong Joon Park
PDF
Six-CD: Benchmarking Concept Removals for Text-to-Image Diffusion Models Jie Ren, Kangrui Chen, Yingqian Cui, Shenglai Zeng, Hui Liu, Yue Xing, Jiliang Tang, Lingjuan Lyu
PDF
SKDream: Controllable Multi-View and 3D Generation with Arbitrary Skeletons Yuanyou Xu, Zongxin Yang, Yi Yang
PDF
SKE-Layout: Spatial Knowledge Enhanced Layout Generation with LLMs Junsheng Wang, Nieqing Cao, Yan Ding, Mengying Xie, Fuqiang Gu, Chao Chen
PDF
Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch Aneeshan Sain, Subhajit Maity, Pinaki Nath Chowdhury, Shubhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe Song
PDF
SketchAgent: Language-Driven Sequential Sketch Generation Yael Vinker, Tamar Rott Shaham, Kristine Zheng, Alex Zhao, Judith E Fan, Antonio Torralba
PDF
SketchFusion: Learning Universal Sketch Features Through Fusing Foundation Models Subhadeep Koley, Tapas Kumar Dutta, Aneeshan Sain, Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Yi-Zhe Song
PDF
Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback Mohd Hozaifa Khan, Ravi Kiran Sarvadevabhatla
PDF
SketchVideo: Sketch-Based Video Generation and Editing Feng-Lin Liu, Hongbo Fu, Xintao Wang, Weicai Ye, Pengfei Wan, Di Zhang, Lin Gao
PDF
Sketchy Bounding-Box Supervision for 3D Instance Segmentation Qian Deng, Le Hui, Jin Xie, Jian Yang
PDF
SkillMimic: Learning Basketball Interaction Skills from Demonstrations Yinhuai Wang, Qihan Zhao, Runyi Yu, Hok Wai Tsui, Ailing Zeng, Jing Lin, Zhengyi Luo, Jiwen Yu, Xiu Li, Qifeng Chen, Jian Zhang, Lei Zhang, Ping Tan
PDF
Skip Tuning: Pre-Trained Vision-Language Models Are Effective and Efficient Adapters Themselves Shihan Wu, Ji Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Heng Tao Shen
PDF
SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling Qi Zhu, Jiangwei Lao, Deyi Ji, Junwei Luo, Kang Wu, Yingying Zhang, Lixiang Ru, Jian Wang, Jingdong Chen, Ming Yang, Dong Liu, Feng Zhao
PDF
SLADE: Shielding Against Dual Exploits in Large Vision-Language Models Md Zarif Hossain, Ahmed Imteaj
PDF
SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos Yuzheng Liu, Siyan Dong, Shuzhe Wang, Yingda Yin, Yanchao Yang, Qingnan Fan, Baoquan Chen
PDF
SleeperMark: Towards Robust Watermark Against Fine-Tuning Text-to-Image Diffusion Models Zilan Wang, Junfeng Guo, Jiacheng Zhu, Yiming Li, Heng Huang, Muhao Chen, Zhengzhong Tu
PDF
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding Ying Chen, Guoan Wang, Yuanfeng Ji, Yanjun Li, Jin Ye, Tianbin Li, Ming Hu, Rongshan Yu, Yu Qiao, Junjun He
PDF
SLVR: Super-Light Visual Reconstruction via Blueprint Controllable Convolutions and Exploring Feature Diversity Representation Ning Ni, Libao Zhang
PDF
SmartCLIP: Modular Vision-Language Alignment with Identification Guarantees Shaoan Xie, Lingjing Lingjing, Yujia Zheng, Yu Yao, Zeyu Tang, Eric P. Xing, Guangyi Chen, Kun Zhang
PDF
SmartEraser: Remove Anything from Images Using Masked-Region Guidance Longtao Jiang, Zhendong Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Lei Shi, Dong Chen, Houqiang Li
PDF
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning Fida Mohammad Thoker, Letian Jiang, Chen Zhao, Bernard Ghanem
PDF
SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity Yijie Xu, Bolun Zheng, Wei Zhu, Hangjia Pan, Yuchen Yao, Ning Xu, Anan Liu, Quan Zhang, Chenggang Yan
PDF
SnapGen-V: Generating a Five-Second Video Within Five Seconds on a Mobile Device Yushu Wu, Zhixing Zhang, Yanyu Li, Yanwu Xu, Anil Kag, Yang Sui, Huseyin Coskun, Ke Ma, Aleksei Lebedev, Ju Hu, Dimitris N. Metaxas, Yanzhi Wang, Sergey Tulyakov, Jian Ren
PDF
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training Jierun Chen, Dongting Hu, Xijie Huang, Huseyin Coskun, Arpit Sahni, Aarush Gupta, Anujraaj Goyal, Dishani Lahiri, Rajesh Singh, Yerlan Idelbayev, Junli Cao, Yanyu Li, Kwang-Ting Cheng, S.-H. Gary Chan, Mingming Gong, Sergey Tulyakov, Anil Kag, Yanwu Xu, Jian Ren
PDF
SnowMaster: Comprehensive Real-World Image Desnowing via MLLM with Multi-Model Feedback Optimization Jianyu Lai, Sixiang Chen, Yunlong Lin, Tian Ye, Yun Liu, Song Fei, Zhaohu Xing, Hongtao Wu, Weiming Wang, Lei Zhu
PDF
SOAP: Vision-Centric 3D Semantic Scene Completion with Scene-Adaptive Decoder and Occluded Region-Aware View Projection Hyo-Jun Lee, Yeong Jun Koh, Hanul Kim, Hyunseop Kim, Yonguk Lee, Jinu Lee
PDF
SocialGesture: Delving into Multi-Person Gesture Understanding Xu Cao, Pranav Virupaksha, Wenqi Jia, Bolin Lai, Fiona Ryan, Sangmin Lee, James M. Rehg
PDF
SocialMOIF: Multi-Order Intention Fusion for Pedestrian Trajectory Prediction Kai Chen, Xiaodong Zhao, Yujie Huang, Guoyu Fang, Xiao Song, Ruiping Wang, Ziyuan Wang
PDF
Soft Self-Labeling and Potts Relaxations for Weakly-Supervised Segmentation Zhongwen Zhang, Yuri Boykov
PDF
SoftShadow: Leveraging Soft Masks for Penumbra-Aware Shadow Removal Xinrui Wang, Lanqing Guo, Xiyu Wang, Siyu Huang, Bihan Wen
PDF
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer Hao Chen, Ze Wang, Xiang Li, Ximeng Sun, Fangyi Chen, Jiang Liu, Jindong Wang, Bhiksha Raj, Zicheng Liu, Emad Barsoum
PDF
SOGS: Second-Order Anchor for Advanced 3D Gaussian Splatting Jiahui Zhang, Fangneng Zhan, Ling Shao, Shijian Lu
PDF
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters Jianping Jiang, Weiye Xiao, Zhengyu Lin, Huaizhong Zhang, Tianxiang Ren, Yang Gao, Zhiqian Lin, Zhongang Cai, Lei Yang, Ziwei Liu
PDF
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving Xuesong Chen, Linjiang Huang, Tao Ma, Rongyao Fang, Shaoshuai Shi, Hongsheng Li
PDF
Solving Instance Detection from an Open-World Perspective Qianqian Shen, Yunhan Zhao, Nahyun Kwon, Jeeeun Kim, Yanan Li, Shu Kong
PDF
SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning Seokju Yun, Seunghye Chae, Dongheon Lee, Youngmin Ro
PDF
Sonata: Self-Supervised Learning of Reliable Point Representations Xiaoyang Wu, Daniel DeTone, Duncan Frost, Tianwei Shen, Chris Xie, Nan Yang, Jakob Engel, Richard Newcombe, Hengshuang Zhao, Julian Straub
PDF
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation Xiaozhong Ji, Xiaobin Hu, Zhihong Xu, Junwei Zhu, Chuming Lin, Qingdong He, Jiangning Zhang, Donghao Luo, Yi Chen, Qin Lin, Qinglin Lu, Chengjie Wang
PDF
Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues Sihong Huang, Jiaxin Wu, Xiaoyong Wei, Yi Cai, Dongmei Jiang, Yaowei Wang
PDF
SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding Mingfei Chen, Israel D. Gebru, Ishwarya Ananthabhotla, Christian Richardt, Dejan Markovic, Jake Sandakly, Steven Krenn, Todd Keebler, Eli Shlizerman, Alexander Richard
PDF
SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts Shijia Zhao, Qiming Xia, Xusheng Guo, Pufan Zou, Maoji Zheng, Hai Wu, Chenglu Wen, Cheng Wang
PDF
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models Yongting Zhang, Lu Chen, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui, Jing Shao
PDF
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images Zixuan Huang, Mark Boss, Aaryaman Vasishta, James M. Rehg, Varun Jampani
PDF
SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models Kevin Miller, Aditya Gangrade, Samarth Mishra, Kate Saenko, Venkatesh Saligrama
PDF
SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction Yutao Tang, Yuxiang Guo, Deming Li, Cheng Peng
PDF
Sparse Point Cloud Patches Rendering via Splitting 2D Gaussians Changfeng Ma, Ran Bi, Jie Guo, Chongjun Wang, Yanwen Guo
PDF
Sparse Voxels Rasterization: Real-Time High-Fidelity Radiance Field Rendering Cheng Sun, Jaesung Choe, Charles Loop, Wei-Chiu Ma, Yu-Chiang Frank Wang
PDF
Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views Jiang Wu, Rui Li, Yu Zhu, Rong Guo, Jinqiu Sun, Yanning Zhang
PDF
SparseAlign: A Fully Sparse Framework for Cooperative Object Detection Yunshuang Yuan, Yan Xia, Daniel Cremers, Monika Sester
PDF
Spatial Transport Optimization by Repositioning Attention mAP for Training-Free Text-to-Image Synthesis Woojung Han, Yeonkyung Lee, Chanyoung Kim, Kwanghyun Park, Seong Jae Hwang
PDF
Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation Qi Lv, Hao Li, Xiang Deng, Rui Shao, Yinchuan Li, Jianye Hao, Longxiang Gao, Michael Yu Wang, Liqiang Nie
PDF
Spatial457: A Diagnostic Benchmark for 6d Spatial Reasoning of Large Mutimodal Models Xingrui Wang, Wufei Ma, Tiezheng Zhang, Celso M de Melo, Jieneng Chen, Alan Yuille
PDF
SpatialCLIP: Learning 3D-Aware Image Representations from Spatially Discriminative Language Zehan Wang, Sashuai Zhou, Shaoxuan He, Haifeng Huang, Lihe Yang, Ziang Zhang, Xize Cheng, Shengpeng Ji, Tao Jin, Hengshuang Zhao, Zhou Zhao
PDF
SpatialDreamer: Self-Supervised Stereo Video Synthesis from Monocular Input Zhen Lv, Yangqi Long, Congzhentao Huang, Cao Li, Chengfei Lv, Hao Ren, Dian Zheng
PDF
SpatialLLM: A Compound 3D-Informed Design Towards Spatially-Intelligent Large Multimodal Models Wufei Ma, Luoxin Ye, Celso M de Melo, Alan Yuille, Jieneng Chen
PDF
Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting Jingyi Xu, Xieyuanli Chen, Junyi Ma, Jiawei Huang, Jintao Xu, Yue Wang, Ling Pei
PDF
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling Junha Hyung, Kinam Kim, Susung Hong, Min-Jung Kim, Jaegul Choo
PDF
SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-View Synthesis from Sparse Inputs Guibiao Liao, Qing Li, Zhenyu Bao, Guoping Qiu, Kanglin Liu
PDF
Spectral Informed Mamba for Robust Point Cloud Processing Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Sahar Dastani, Milad Cheraghalikhani, Gustavo Adolfo Vargas Hakim, David Osowiechi, Farzad Beizaee, Ismail Ben Ayed, Christian Desrosiers
PDF
Spectral State Space Model for Rotation-Invariant Visual Representation Learning Sahar Dastani, Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, David Osowiechi, Gustavo Adolfo Vargas Hakim, Farzad Beizaee, Milad Cheraghalikhani, Arnab Kumar Mondal, Herve Lombaert, Christian Desrosiers
PDF
SpecTRe-GS: Modeling Highly Specular Surfaces with Reflected Nearby Objects by Tracing Rays in 3D Gaussian Splatting Jiajun Tang, Fan Fei, Zhihao Li, Xiao Tang, Shiyong Liu, Youyu Chen, Binxiao Huang, Zhenyu Chen, Xiaofei Wu, Boxin Shi
PDF
SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes Cheng-De Fan, Chen-Wei Chang, Yi-Ruei Liu, Jie-Ying Lee, Jiun-Long Huang, Yu-Chee Tseng, Yu-Lun Liu
PDF
Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives Alex Hanson, Allen Tu, Geng Lin, Vasu Singla, Matthias Zwicker, Tom Goldstein
PDF
SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception Yaniv Benny, Lior Wolf
PDF
Spherical Manifold Guided Diffusion Model for Panoramic Image Generation Xiancheng Sun, Mai Xu, Shengxi Li, Senmao Ma, Xin Deng, Lai Jiang, Gang Shen
PDF
Spiking Transformer with Spatial-Temporal Attention Donghyun Lee, Yuhang Li, Youngeun Kim, Shiting Xiao, Priyadarshini Panda
PDF
Spiking Transformer: Introducing Accurate Addition-Only Spiking Self-Attention for Transformer Yufei Guo, Xiaode Liu, Yuanpei Chen, Weihang Peng, Yuhan Zhang, Zhe Ma
PDF
SpiritSight Agent: Advanced GUI Agent with One Look Zhiyuan Huang, Ziming Cheng, Junting Pan, Zhaohui Hou, Mingjie Zhan
PDF
Spk2SRImgNet: Super-Resolve Dynamic Scene from Spike Stream via Motion Aligned Collaborative Filtering Yuanlin Wang, Yiyang Zhang, Ruiqin Xiong, Jing Zhao, Jian Zhang, Xiaopeng Fan, Tiejun Huang
PDF
SplatAD: Real-Time LiDAR and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving Georg Hess, Carl Lindström, Maryam Fatemi, Christoffer Petersson, Lennart Svensson
PDF
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis Hyojun Go, Byeongjun Park, Jiho Jang, Jin-Young Kim, Soonwoo Kwon, Changick Kim
PDF
SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving Su Sun, Cheng Zhao, Zhuoyang Sun, Yingjie Victor Chen, Mei Chen
PDF
Splatter-360: Generalizable 360 Gaussian Splatting for Wide-Baseline Panoramic Images Zheng Chen, Chenming Wu, Zhelun Shen, Chen Zhao, Weicai Ye, Haocheng Feng, Errui Ding, Song-Hai Zhang
PDF
SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello, Jaeho Moon, Jihyong Oh, Munchurl Kim
PDF
Split Adaptation for Pre-Trained Vision Transformers Lixu Wang, Bingqi Shang, Yi Li, Payal Mohapatra, Wei Dong, Xiao Wang, Qi Zhu
PDF
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking Wenrui Cai, Qingjie Liu, Yunhong Wang
PDF
Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving Alexey Nekrasov, Malcolm Burdorf, Stewart Worrall, Bastian Leibe, Julie Stephany Berrio Perez
PDF
SSHNet: Unsupervised Cross-Modal Homography Estimation via Problem Reformulation and Split Optimization Junchen Yu, Si-Yuan Cao, Runmin Zhang, Chenghao Zhang, Zhu Yu, Shujie Chen, Bailin Yang, Hui-Liang Shen
PDF
STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks Tianqing Zhang, Kairong Yu, Xian Zhong, Hongwei Wang, Qi Xu, Qiang Zhang
PDF
Stabilizing and Accelerating Autofocus with Expert Trajectory Regularized Deep Reinforcement Learning Shouhang Zhu, Chenglin Li, Yuankun Jiang, Li Wei, Nuowen Kan, Ziyang Zheng, Wenrui Dai, Junni Zou, Hongkai Xiong
PDF
Stable Flow: Vital Layers for Training-Free Image Editing Omri Avrahami, Or Patashnik, Ohad Fried, Egor Nemchinov, Kfir Aberman, Dani Lischinski, Daniel Cohen-Or
PDF
Stable-SCore: A Stable Registration-Based Framework for 3D Shape Correspondence Haolin Liu, Xiaohang Zhan, Zizheng Yan, Zhongjin Luo, Yuxin Wen, Xiaoguang Han
PDF
StableAnimator: High-Quality Identity-Preserving Human Image Animation Shuyuan Tu, Zhen Xing, Xintong Han, Zhi-Qi Cheng, Qi Dai, Chong Luo, Zuxuan Wu
PDF
Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection Jikang Cheng, Zhiyuan Yan, Ying Zhang, Li Hao, Jiaxin Ai, Qin Zou, Chen Li, Zhongyuan Wang
PDF
StageDesigner: Artistic Stage Generation for Scenography via Theater Scripts Zhaoxing Gan, Mengtian Li, Ruhua Chen, Zhongxia Ji, Sichen Guo, Huanling Hu, Guangnan Ye, Zuo Hu
PDF
Star with Bilinear Mapping Zelin Peng, Yu Huang, Zhengqin Xu, Feilong Tang, Ming Hu, Xiaokang Yang, Wei Shen
PDF
STAR-Edge: Structure-Aware Local Spherical Curve Representation for Thin-Walled Edge Extraction from Unstructured Point Clouds Zikuan Li, Honghua Chen, Yuecheng Wang, Sibo Wu, Mingqiang Wei, Jun Wang
PDF
StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation Shangjin Zhai, Zhichao Ye, Jialin Liu, Weijian Xie, Jiaqi Hu, Zhen Peng, Hua Xue, Danpeng Chen, Xiaomeng Wang, Lei Yang, Nan Wang, Haomin Liu, Guofeng Zhang
PDF
StarVector: Generating Scalable Vector Graphics Code from Images and Text Juan A. Rodriguez, Abhay Puri, Shubham Agarwal, Issam H. Laradji, Pau Rodriguez, Sai Rajeswar, David Vazquez, Christopher Pal, Marco Pedersoli
PDF
STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction Zhimin Liao, Ping Wei, Shuaijia Chen, Haoxuan Wang, Ziyang Ren
PDF
STDD: Spatio-Temporal Dual Diffusion for Video Generation Shuaizhen Yao, Xiaoya Zhang, Xin Liu, Mengyi Liu, Zhen Cui
PDF
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images Yuze He, Yanning Zhou, Wang Zhao, Zhongkai Wu, Kaiwen Xiao, Wei Yang, Yong-Jin Liu, Xiao Han
PDF
Steady Progress Beats Stagnation: Mutual Aid of Foundation and Conventional Models in Mixed Domain Semi-Supervised Medical Image Segmentation Qinghe Ma, Jian Zhang, Zekun Li, Lei Qi, Qian Yu, Yinghuan Shi
PDF
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models Zhaoyi Liu, Huan Zhang
PDF
Steepest Descent Density Control for Compact 3D Gaussian Splatting Peihao Wang, Yuehao Wang, Dilin Wang, Sreyas Mohan, Zhiwen Fan, Lemeng Wu, Ruisi Cai, Yu-Ying Yeh, Zhangyang Wang, Qiang Liu, Rakesh Ranjan
PDF
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks Han Wang, Gang Wang, Huan Zhang
PDF
STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-Guided Self-Training Haiyi Qiu, Minghe Gao, Long Qian, Kaihang Pan, Qifan Yu, Juncheng Li, Wenjie Wang, Siliang Tang, Yueting Zhuang, Tat-Seng Chua
PDF
STEPS: Sequential Probability Tensor Estimation for Text-to-Image Hard Prompt Search Yuning Qiu, Andong Wang, Chao Li, Haonan Huang, Guoxu Zhou, Qibin Zhao
PDF
Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail Luca Bartolomei, Fabio Tosi, Matteo Poggi, Stefano Mattoccia
PDF
STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer, Vishal M. Patel, Karthik Nandakumar
PDF
Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos Linyi Jin, Richard Tucker, Zhengqi Li, David Fouhey, Noah Snavely, Aleksander Holynski
PDF
StickMotion: Generating 3D Human Motions by Drawing a Stickman Tao Wang, Zhihua Wu, Qiaozhi He, Jiaming Chu, Ling Qian, Yu Cheng, Junliang Xing, Jian Zhao, Lei Jin
PDF
STiL: Semi-Supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification Siyi Du, Xinzhe Luo, Declan P. O'Regan, Chen Qin
PDF
STING-BEE: Towards Vision-Language Model for Real-World X-Ray Baggage Security Inspection Divya Velayudhan, Abdelfatah Ahmed, Mohamad Alansari, Neha Gour, Abderaouf Behouch, Taimur Hassan, Syed Talal Wasim, Nabil Maalej, Muzammal Naseer, Juergen Gall, Mohammed Bennamoun, Ernesto Damiani, Naoufel Werghi
PDF
STINR: Deciphering Spatial Transcriptomics via Implicit Neural Representation Yisi Luo, Xile Zhao, Kai Ye, Deyu Meng
PDF
Stochastic Human Motion Prediction with Memory of Action Transition and Action Characteristic Jianwei Tang, Hong Yang, Tengyue Chen, Jian-Fang Hu
PDF
Stop Learning It All to Mitigate Visual Hallucination, Focus on the Hallucination Target. Dokyoon Yoon, Youngsook Song, Woomyoung Park
PDF
Stop Walking in Circles! Bailing Out Early in Projected Gradient Descent Philip Doldo, Derek Everett, Amol Khanna, Andre T Nguyen, Edward Raff
PDF
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding Zichen Liu, Kunlun Xu, Bing Su, Xu Zou, Yuxin Peng, Jiahuan Zhou
PDF
StoryGPT-V: Large Language Models as Consistent Story Visualizers Xiaoqian Shen, Mohamed Elhoseiny
PDF
STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding Aaryan Garg, Akash Kumar, Yogesh S Rawat
PDF
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text Roberto Henschel, Levon Khachatryan, Hayk Poghosyan, Daniil Hayrapetyan, Vahram Tadevosyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi
PDF
StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models Yunzhi Yan, Zhen Xu, Haotong Lin, Haian Jin, Haoyu Guo, Yida Wang, Kun Zhan, Xianpeng Lang, Hujun Bao, Xiaowei Zhou, Sida Peng
PDF
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, Lingjuan Lyu
PDF
Structure from Collision Takuhiro Kaneko
PDF
Structure-Aware Correspondence Learning for Relative Pose Estimation Yihan Chen, Wenfei Yang, Huan Ren, Shifeng Zhang, Tianzhu Zhang, Feng Wu
PDF
Structure-from-Motion with a Non-Parametric Camera Model Yihan Wang, Linfei Pan, Marc Pollefeys, Viktor Larsson
PDF
Structured 3D Latents for Scalable and Versatile 3D Generation Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, Jiaolong Yang
PDF
Style Evolving Along Chain-of-Thought for Unknown-Domain Object Detection Zihao Zhang, Aming Wu, Yahong Han
PDF
Style Quantization for Data-Efficient GAN Training Jian Wang, Xin Lan, Jizhe Zhou, Yuxin Tian, Jiancheng Lv
PDF
Style-Editor: Text-Driven Object-Centric Style Editing Jihun Park, Jongmin Gim, Kyoungmin Lee, Seunghun Lee, Sunghoon Im
PDF
StyleMaster: Stylize Your Video with Artistic Generation and Translation Zixuan Ye, Huijuan Huang, Xintao Wang, Pengfei Wan, Di Zhang, Wenhan Luo
PDF
StyleSSP: Sampling StartPoint Enhancement for Training-Free Diffusion-Based Method for Style Transfer Ruojun Xu, Weijie Xi, XiaoDi Wang, Yongbo Mao, Zach Cheng
PDF
StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements Mingkun Lei, Xue Song, Beier Zhu, Hao Wang, Chi Zhang
PDF
Subnet-Aware Dynamic Supernet Training for Neural Architecture Search Jeimin Jeon, Youngmin Oh, Junghyup Lee, Donghyeon Baek, Dohyung Kim, Chanho Eom, Bumsub Ham
PDF
Subspace Constraint and Contribution Estimation for Heterogeneous Federated Learning Xiangtao Zhang, Sheng Li, Ao Li, Yipeng Liu, Fan Zhang, Ce Zhu, Le Zhang
PDF
Sufficient Invariant Learning for Distribution Shift Taero Kim, Subeen Park, Sungjun Lim, Yonghan Jung, Krikamol Muandet, Kyungwoo Song
PDF
SUM Parts: Benchmarking Part-Level Semantic Segmentation of Urban Meshes Weixiao Gao, Liangliang Nan, Hugo Ledoux
PDF
SuperLightNet: Lightweight Parameter Aggregation Network for Multimodal Brain Tumor Segmentation Feng Yu, Jiacheng Cao, Li Liu, Minghua Jiang
PDF
SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization Yi Du, Zhipeng Zhao, Shaoshu Su, Sharath Golluri, Haoze Zheng, Runmao Yao, Chen Wang
PDF
Supervising Sound Localization by In-the-Wild Egomotion Anna Min, Ziyang Chen, Hang Zhao, Andrew Owens
PDF
SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity Ke Ma, Jiaqi Tang, Bin Guo, Fan Dang, Sicong Liu, Zhui Zhu, Lei Wu, Cheng Fang, Ying-Cong Chen, Zhiwen Yu, Yunhao Liu
PDF
SVDC: Consistent Direct Time-of-Flight Video Depth Completion with Frequency Selective Fusion Xuan Zhu, Jijun Xiang, Xianqi Wang, Longliang Liu, Yu Wang, Hong Zhang, Fei Guo, Xin Yang
PDF
SVFR: A Unified Framework for Generalized Video Face Restoration Zhiyao Wang, Xu Chen, Chengming Xu, Junwei Zhu, Xiaobin Hu, Jiangning Zhang, Chengjie Wang, Yuqi Liu, Yiyi Zhou, Rongrong Ji
PDF
SVG-IR: Spatially-Varying Gaussian Splatting for Inverse Rendering Hanxiao Sun, Yupeng Gao, Jin Xie, Jian Yang, Beibei Wang
PDF
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation Hao Du, Bo Wu, Yan Lu, Zhendong Mao
PDF
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion Trong-Tung Nguyen, Quang Nguyen, Khoi Nguyen, Anh Tran, Cuong Pham
PDF
Symbolic Representation for Any-to-Any Generative Tasks Jiaqi Chen, Xiaoye Zhu, Yue Wang, Tianyang Liu, Xinhui Chen, Ying Chen, Chak Tou Leong, Yifei Ke, Joseph Liu, Yiwen Yuan, Julian McAuley, Li-jia Li
PDF
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization Hongrui Jia, Chaoya Jiang, Haiyang Xu, Wei Ye, Mengfan Dong, Ming Yan, Ji Zhang, Fei Huang, Shikun Zhang
PDF
Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation Xiang Li, Zixuan Huang, Anh Thai, James M. Rehg
PDF
Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition Juncheng Wang, Chao Xu, Cheng Yu, Lei Shang, Zhe Hu, Shujun Wang, Liefeng Bo
PDF
SyncSDE: A Probabilistic Framework for Diffusion Synchronization Hyunjun Lee, Hyunsoo Lee, Sookwan Han
PDF
SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li, Olga Zatsarynna, Juergen Gall
PDF
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Hao Li, Changyao Tian, Jie Shao, Xizhou Zhu, Zhaokai Wang, Jinguo Zhu, Wenhan Dou, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai
PDF
Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation Shuling Zhao, Fa-Ting Hong, Xiaoshui Huang, Dan Xu
PDF
SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis Bangbang Zhou, Zuan Gao, Zixiao Wang, Boqiang Zhang, Yuxin Wang, Zhineng Chen, Hongtao Xie
PDF
Synthetic Data Is an Elegant GIFT for Continual Vision-Language Models Bin Wu, Wuxuan Shi, Jinqiao Wang, Mang Ye
PDF
Synthetic Prior for Few-Shot Drivable Head Avatar Inversion Wojciech Zielonka, Stephan J. Garbin, Alexandros Lattas, George Kopanas, Paulo Gotardo, Thabo Beeler, Justus Thies, Timo Bolkart
PDF
Synthetic Visual Genome Jae Sung Park, Zixian Ma, Linjie Li, Chenhao Zheng, Cheng-Yu Hsieh, Ximing Lu, Khyathi Chandu, Quan Kong, Norimasa Kobori, Ali Farhadi, Yejin Choi, Ranjay Krishna
PDF
Synthetic-to-Real Self-Supervised Robust Depth Estimation via Learning with Motion and Structure Priors Weilong Yan, Ming Li, Haipeng Li, Shuwei Shao, Robby T. Tan
PDF
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-Render Synthetic Faces Sumit Chaturvedi, Mengwei Ren, Yannick Hold-Geoffroy, Jingyuan Liu, Julie Dorsey, Zhixin Shu
PDF
T-CIL: Temperature Scaling Using Adversarial Perturbation for Calibration in Class-Incremental Learning Seong-Hyeon Hwang, Minsu Kim, Steven Euijong Whang
PDF
T-FAKE: Synthesizing Thermal Images for Facial Landmarking Philipp Flotho, Moritz Piening, Anna Kukleva, Gabriele Steidl
PDF
T2ICount: Enhancing Cross-Modal Understanding for Zero-Shot Counting Yifei Qian, Zhongliang Guo, Bowen Deng, Chun Tong Lei, Shuai Zhao, Chun Pong Lau, Xiaopeng Hong, Michael P. Pound
PDF
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation Lijun Li, Zhelun Shi, Xuhao Hu, Bowen Dong, Yiran Qin, Xihui Liu, Lu Sheng, Jing Shao
PDF
T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving Changsheng Lv, Mengshi Qi, Liang Liu, Huadong Ma
PDF
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-Video Generation Kaiyue Sun, Kaiyi Huang, Xian Liu, Yue Wu, Zihan Xu, Zhenguo Li, Xihui Liu
PDF
TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-Stage Fusion Yiran Wang, Jiaqi Li, Chaoyi Hong, Ruibo Li, Liusheng Sun, Xiao Song, Zhe Wang, Zhiguo Cao, Guosheng Lin
PDF
TADFormer: Task-Adaptive Dynamic TransFormer for Efficient Multi-Task Learning Seungmin Baek, Soyul Lee, Hayeon Jo, Hyesong Choi, Dongbo Min
PDF
TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions Wang Yu-Hang, Junkang Guo, Aolei Liu, Kaihao Wang, Zaitong Wu, Zhenyu Liu, Wenfei Yin, Jian Liu
PDF
TAGA: Self-Supervised Learning for Template-Free Animatable Gaussian Articulated Model Zhichao Zhai, Guikun Chen, Wenguan Wang, Dong Zheng, Jun Xiao
PDF
TailedCore: Few-Shot Sampling for Unsupervised Long-Tail Noisy Anomaly Detection Yoon Gyo Jung, Jaewoo Park, Jaeho Yoon, Kuan-Chuan Peng, Wonchul Kim, Andrew Beng Jin Teoh, Octavia Camps
PDF
Take the Bull by the Horns: Learning to Segment Hard Samples Yuan Guo, Jingyu Kong, Yu Wang, Yuping Duan
PDF
Taming Teacher Forcing for Masked Autoregressive Video Generation Deyu Zhou, Quan Sun, Yuang Peng, Kun Yan, Runpei Dong, Duomin Wang, Zheng Ge, Nan Duan, Xiangyu Zhang
PDF
Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs Yingji Zhong, Zhihao Li, Dave Zhenyu Chen, Lanqing Hong, Dan Xu
PDF
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition Yilong Wang, Zilin Gao, Qilong Wang, Zhaofeng Chen, Peihua Li, Qinghua Hu
PDF
TANGO: Training-Free Embodied AI Agents for Open-World Tasks Filippo Ziliotto, Tommaso Campari, Luciano Serafini, Lamberto Ballan
PDF
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting Jianchuan Chen, Jingchuan Hu, Gaige Wang, Zhonghua Jiang, Tiansong Zhou, Zhiwen Chen, Chengfei Lv
PDF
TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models Xin Wang, Kai Chen, Jiaming Zhang, Jingjing Chen, Xingjun Ma
PDF
Targeted Forgetting of Image Subgroups in CLIP Models Zeliang Zhang, Gaowen Liu, Charles Fleming, Ramana Rao Kompella, Chenliang Xu
PDF
TAROT: Towards Essentially Domain-Invariant Robustness with Theoretical Justification Dongyoon Yang, Jihu Lee, Yongdai Kim
PDF
Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics Shibo Zhao, Sifan Zhou, Raphael Blanchard, Yuheng Qiu, Wenshan Wang, Sebastian Scherer
PDF
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Ziang Yan, Zhilin Li, Yinan He, Chenting Wang, Kunchang Li, Xinhao Li, Xiangyu Zeng, Zilei Wang, Yali Wang, Yu Qiao, Limin Wang, Yi Wang
PDF
Task Singular Vectors: Reducing Task Interference in Model Merging Antonio Andrea Gargiulo, Donato Crisostomi, Maria Sofia Bucarelli, Simone Scardapane, Fabrizio Silvestri, Emanuele Rodolà
PDF
Task-Agnostic Guided Feature Expansion for Class-Incremental Learning Bowen Zheng, Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan
PDF
Task-Aware Clustering for Prompting Vision-Language Models Fusheng Hao, Fengxiang He, Fuxiang Wu, Tichao Wang, Chengqun Song, Jun Cheng
PDF
Task-Aware Cross-Modal Feature Refinement Transformer with Large Language Models for Visual Grounding Wenbo Chen, Zhen Xu, Ruotao Xu, Si Wu, Hau-San Wong
PDF
Task-Driven Image Fusion with Learnable Fusion Loss Haowen Bai, Jiangshe Zhang, Zixiang Zhao, Yichen Wu, Lilun Deng, Yukun Cui, Tao Feng, Shuang Xu
PDF
Task-Specific Gradient Adaptation for Few-Shot One-Class Classification Yunlong Li, Xiabi Liu, Liyuan Pan, Yuchen Ren
PDF
Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting Maochen Yang, Zekun Li, Jian Zhang, Lei Qi, Yinghuan Shi
PDF
TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation Hongxiang Zhao, Xingchen Liu, Mutian Xu, Yiming Hao, Weikai Chen, Xiaoguang Han
PDF
Taxonomy-Aware Evaluation of Vision-Language Models Vésteinn Snæbjarnarson, Kevin Du, Niklas Stoehr, Serge Belongie, Ryan Cotterell, Nico Lang, Stella Frank
PDF
TCFG: Tangential Damping Classifier-Free Guidance Mingi Kwon, Shin seong Kim, Jaeseok Jeong, Yi Ting Hsiao, Youngjung Uh
PDF
Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution Zhiyuan You, Xin Cai, Jinjin Gu, Tianfan Xue, Chao Dong
PDF
Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation Dingcheng Zhen, Shunshun Yin, Shiyang Qin, Hou Yi, Ziwei Zhang, Siyuan Liu, Gan Qi, Ming Tao
PDF
Temporal Action Detection Model Compression by Progressive Block Drop Xiaoyong Chen, Yong Guo, Jiaming Liang, Sitong Zhuang, Runhao Zeng, Xiping Hu
PDF
Temporal Alignment-Free Video Matching for Few-Shot Action Recognition SuBeen Lee, WonJun Moon, Hyun Seok Seong, Jae-Pil Heo
PDF
Temporal Score Analysis for Understanding and Correcting Diffusion Artifacts Yu Cao, Zengqun Zhao, Ioannis Patras, Shaogang Gong
PDF
Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks Kairong Yu, Chengting Yu, Tianqing Zhang, Xiaochen Zhao, Shu Yang, Hongwei Wang, Qiang Zhang, Qi Xu
PDF
Temporally Consistent Object-Centric Learning by Contrasting Slots Anna Manasyan, Maximilian Seitzer, Filip Radovic, Georg Martius, Andrii Zadaianchuk
PDF
TensoFlow: Tensorial Flow-Based Sampler for Inverse Rendering Chun Gu, Xiaofei Wei, Li Zhang, Xiatian Zhu
PDF
Test-Time Augmentation Improves Efficiency in Conformal Prediction Divya Shanmugam, Helen Lu, Swami Sankaranarayanan, John Guttag
PDF
Test-Time Backdoor Detection for Object Detection Models Hangtao Zhang, Yichen Wang, Shihui Yan, Chenyu Zhu, Ziqi Zhou, Linshan Hou, Shengshan Hu, Minghui Li, Yanjun Zhang, Leo Yu Zhang
PDF
Test-Time Domain Generalization via Universe Learning: A Multi-Graph Matching Approach for Medical Image Segmentation Xingguo Lv, Xingbo Dong, Liwen Wang, Jiewen Yang, Lei Zhao, Bin Pu, Zhe Jin, Xuejun Li
PDF
Test-Time Fine-Tuning of Image Compression Models for Multi-Task Adaptability Unki Park, Seongmoon Jeong, Youngchan Jang, Gyeong-Moon Park, Jong Hwan Ko
PDF
Test-Time Visual In-Context Tuning Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele
PDF
TexGarment: Consistent Garment UV Texture Generation via Efficient 3D Structure-Guided Diffusion Transformer Jialun Liu, Jinbo Wu, Xiaobo Gao, Jiakui Hu, Bojun Xiong, Xing Liu, Chen Zhao, Hongbin Pei, Haocheng Feng, Yingying Li, Errui Ding, Jingdong Wang
PDF
TexGaussian: Generating High-Quality PBR Material via Octree-Based 3D Gaussian Splatting Bojun Xiong, Jialun Liu, Jiakui Hu, Chenming Wu, Jinbo Wu, Xing Liu, Chen Zhao, Errui Ding, Zhouhui Lian
PDF
Text Augmented Correlation Transformer for Few-Shot Classification & Segmentation Srinivasa Rao Nandam, Sara Atito, Zhenhua Feng, Josef Kittler, Muhammad Awais
PDF
Text Embedding Is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps Jeeyung Kim, Erfan Esmaeili, Qiang Qiu
PDF
Text-Driven Fashion Image Editing with Compositional Concept Learning and Counterfactual Abduction Shanshan Huang, Haoxuan Li, Chunyuan Zheng, Mingyuan Ge, Wei Gao, Lei Wang, Li Liu
PDF
Text-Guided Sparse Voxel Pruning for Efficient 3D Visual Grounding Wenxuan Guo, Xiuwei Xu, Ziwei Wang, Jianjiang Feng, Jie Zhou, Jiwen Lu
PDF
Textured Gaussians for Enhanced 3D Scene Appearance Modeling Brian Chao, Hung-Yu Tseng, Lorenzo Porzi, Chen Gao, Tuotuo Li, Qinbo Li, Ayush Saraf, Jia-Bin Huang, Johannes Kopf, Gordon Wetzstein, Changil Kim
PDF
TFCustom: Customized Image Generation with Time-Aware Frequency Feature Guidance Mushui Liu, Dong She, Jingxuan Pang, Qihan Huang, Jiacheng Ying, Wanggui He, Yuanlei Hou, Siming Fu
PDF
The Art of Deception: Color Visual Illusions and Diffusion Models Alexandra Gomez-Villa, Kai Wang, C.Alejandro Parraga, Bartłomiej Twardowski, Jesus Malo, Javier Vazquez-Corral, Joost van den Weijer
PDF
The Change You Want to Detect: Semantic Change Detection in Earth Observation with Hybrid Data Generationf Yanis Benidir, Nicolas Gonthier, Clement Mallet
PDF
The Devil Is in Low-Level Features for Cross-Domain Few-Shot Segmentation Yuhan Liu, Yixiong Zou, Yuhua Li, Ruixuan Li
PDF
The Devil Is in Temporal Token: High Quality Video Reasoning Segmentation Sitong Gong, Yunzhi Zhuge, Lu Zhang, Zongxin Yang, Pingping Zhang, Huchuan Lu
PDF
The Devil Is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation Bingjie Gao, Xinyu Gao, Xiaoxue Wu, Yujie Zhou, Yu Qiao, Li Niu, Xinyuan Chen, Yaohui Wang
PDF
The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models Naveen George, Karthik Nandan Dasaraju, Rutheesh Reddy Chittepu, Konda Reddy Mopuri
PDF
The Impact Label Noise and Choice of Threshold Has on Cross-Entropy and Soft-Dice in Image Segmentation Marcus Nordström, Atsuto Maki, Henrik Hult
PDF
The Language of Motion: Unifying Verbal and Non-Verbal Language of 3D Human Motion Changan Chen, Juze Zhang, Shrinidhi K. Lakshmikanth, Yusu Fang, Ruizhi Shao, Gordon Wetzstein, Li Fei-Fei, Ehsan Adeli
PDF
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition Otto Brookes, Maksim Kukushkin, Majid Mirmehdi, Colleen Stephens, Paula Dieguez, Thurston C. Hicks, Sorrel Jones, Kevin Lee, Maureen S. McCarthy, Amelia Meier, Emmanuelle Normand, Erin G. Wessling, Roman M. Wittig, Kevin Langergraber, Klaus Zuberbühler, Lukas Boesch, Thomas Schmid, Mimi Arandjelovic, Hjalmar Kühl, Tilo Burghardt
PDF
The Photographer's Eye: Teaching Multimodal Large Language Models to See, and Critique like Photographers Daiqing Qi, Handong Zhao, Jing Shi, Simon Jenni, Yifei Fan, Franck Dernoncourt, Scott Cohen, Sheng Li
PDF
The Power of Context: How Multimodality Improves Image Super-Resolution Kangfu Mei, Hossein Talebi, Mojtaba Ardakani, Vishal M. Patel, Peyman Milanfar, Mauricio Delbracio
PDF
The Scene Language: Representing Scenes with Programs, Words, and Embeddings Yunzhi Zhang, Zizhang Li, Matt Zhou, Shangzhe Wu, Jiajun Wu
PDF
Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems Song Xia, Yi Yu, Wenhan Yang, Meiwen Ding, Zhuo Chen, Ling-Yu Duan, Alex C. Kot, Xudong Jiang
PDF
Theory-Inspired Deep Multi-View Multi-Label Learning with Incomplete Views and Noisy Labels Quanjiang Li, Tingjin Luo, Jiahui Liao
PDF
Thin-Shell-SFT: Fine-Grained Monocular Non-Rigid 3D Surface Tracking with Neural Deformation Fields Navami Kairanda, Marc Habermann, Shanthika Naik, Christian Theobalt, Vladislav Golyanik
PDF
Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation Yuanqi Yao, Siao Liu, Haoming Song, Delin Qu, Qizhi Chen, Yan Ding, Bin Zhao, Zhigang Wang, Xuelong Li, Dong Wang
PDF
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Jihan Yang, Shusheng Yang, Anjali W. Gupta, Rilyn Han, Li Fei-Fei, Saining Xie
PDF
Three Cars Approaching Within 100m! Enhancing Distant Geometry by Tri-Axis Voxel Scanning for Camera-Based Semantic Scene Completion Jongseong Bae, Junwoo Ha, Ha Young Kim
PDF
Three-View Focal Length Recovery from Homographies Yaqing Ding, Viktor Kocur, Zuzana Berger Haladova, Qianliang Wu, Shen Cai, Jian Yang, Zuzana Kukelova
PDF
Through-the-Mask: Mask-Based Motion Trajectories for Image-to-Video Generation Guy Yariv, Yuval Kirstain, Amit Zohar, Shelly Sheynin, Yaniv Taigman, Yossi Adi, Sagie Benaim, Adam Polyak
PDF
TIDE: Training Locally Interpretable Domain Generalization Models Enables Test-Time Correction Aishwarya Agarwal, Srikrishna Karanam, Vineet Gandhi
PDF
Tightening Robustness Verification of MaxPool-Based Neural Networks via Minimizing the Over-Approximation Zone Yuan Xiao, Yuchen Chen, Shiqing Ma, Chunrong Fang, Tongtong Bai, Mingzheng Gu, Yuxin Cheng, Yanwei Chen, Zhenyu Chen
PDF
Tiled Diffusion Or Madar, Ohad Fried
PDF
Time of the Flight of the Gaussians: Optimizing Depth Indirectly in Dynamic Radiance Fields Runfeng Li, Mikhail Okunev, Zixuan Guo, Anh Ha Duong, Christian Richardt, Matthew O'Toole, James Tompkin
PDF
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model Feng Liu, Shiwei Zhang, Xiaofeng Wang, Yujie Wei, Haonan Qiu, Yuzhong Zhao, Yingya Zhang, Qixiang Ye, Fang Wan
PDF
TimeTracker: Event-Based Continuous Point Tracking for Video Frame Interpolation with Non-Linear Motion Haoyue Liu, Jinghan Xu, Yi Chang, Hanyu Zhou, Haozhi Zhao, Lin Wang, Luxin Yan
PDF
TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation Yabiao Wang, Shuo Wang, Jiangning Zhang, Ke Fan, Jiafu Wu, Zhucun Xue, Yong Liu
PDF
TinyFusion: Diffusion Transformers Learned Shallow Gongfan Fang, Kunjun Li, Xinyin Ma, Xinchao Wang
PDF
TKG-DM: Training-Free Chroma Key Content Generation Diffusion Model Ryugo Morita, Stanislav Frolov, Brian Bernhard Moser, Takahiro Shirakawa, Ko Watanabe, Andreas Dengel, Jinjia Zhou
PDF
Token Cropr: Faster ViTs for Quite a Few Tasks Benjamin Bergner, Christoph Lippert, Aravindh Mahendran
PDF
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation Liao Qu, Huichao Zhang, Yiheng Liu, Xu Wang, Yi Jiang, Yiming Gao, Hu Ye, Daniel K. Du, Zehuan Yuan, Xinglong Wu
PDF
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions Through Task Tokenization Liang Pan, Zeshi Yang, Zhiyang Dou, Wenjia Wang, Buzhen Huang, Bo Dai, Taku Komura, Jingbo Wang
PDF
Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images Jiuchen Chen, Xinyu Yan, Qizhi Xu, Kaiqi Li
PDF
TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-Centric Video Generation Ruineng Li, Daitao Xing, Huiming Sun, Yuanzhou Ha, Jinglin Shen, Chiuman Ho
PDF
TopNet: Transformer-Efficient Occupancy Prediction Network for Octree-Structured Point Cloud Geometry Compression Xinjie Wang, Yifan Zhang, Ting Liu, Xinpu Liu, Ke Xu, Jianwei Wan, Yulan Guo, Hanyun Wang
PDF
TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model Meilong Xu, Saumya Gupta, Xiaoling Hu, Chen Li, Shahira Abousamra, Dimitris Samaras, Prateek Prasanna, Chao Chen
PDF
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model Cheng Yang, Yang Sui, Jinqi Xiao, Lingyi Huang, Yu Gong, Chendi Li, Jinghua Yan, Yu Bai, Ponnuswamy Sadayappan, Xia Hu, Bo Yuan
PDF
Tora: Trajectory-Oriented Diffusion Transformer for Video Generation Zhenghao Zhang, Junchao Liao, Menghao Li, ZuoZhuo Dai, Bingxue Qiu, Siyu Zhu, Long Qin, Weizhi Wang
PDF
Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction Yuanbo Wang, Zhaoxuan Zhang, Jiajin Qiu, Dilong Sun, Zhengyu Meng, Xiaopeng Wei, Xin Yang
PDF
Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption Du Chen, Tianhe Wu, Kede Ma, Lei Zhang
PDF
Toward Real-World BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen
PDF
Toward Robust Neural Reconstruction from Sparse Point Sets Amine Ouasfi, Shubhendu Jena, Eric Marchand, Adnane Boukhayma
PDF
Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content Rohit Kundu, Hao Xiong, Vishal Mohanty, Athula Balachandran, Amit K. Roy-Chowdhury
PDF
Towards All-in-One Medical Image Re-Identification Yuan Tian, Kaiyuan Ji, Rongzhao Zhang, Yankai Jiang, Chunyi Li, Xiaosong Wang, Guangtao Zhai
PDF
Towards Autonomous Micromobility Through Scalable Urban Simulation Wayne Wu, Honglin He, Chaoyuan Zhang, Jack He, Seth Z. Zhao, Ran Gong, Quanyi Li, Bolei Zhou
PDF
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards Zijing Hu, Fengda Zhang, Long Chen, Kun Kuang, Jiahui Li, Kaifeng Gao, Jun Xiao, Xin Wang, Wenwu Zhu
PDF
Towards Consistent Multi-Task Learning: Unlocking the Potential of Task-Specific Parameters Xiaohan Qin, Xiaoxing Wang, Junchi Yan
PDF
Towards Continual Universal Segmentation Zihan Lin, Zilei Wang, Xu Wang
PDF
Towards Cost-Effective Learning: A Synergy of Semi-Supervised and Active Learning Tianxiang Yin, Ningzhong Liu, Han Sun
PDF
Towards Effective and Sparse Adversarial Attack on Spiking Neural Networks via Breaking Invisible Surrogate Gradients Li Lun, Kunyu Feng, Qinglong Ni, Ling Liang, Yuan Wang, Ying Li, Dunshan Yu, Xiaoxin Cui
PDF
Towards Efficient Foundation Model for Zero-Shot Amodal Segmentation Zhaochen Liu, Limeng Qiao, Xiangxiang Chu, Lin Ma, Tingting Jiang
PDF
Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency Yikai Wang, Chenjie Cao, Junqiu Yu, Ke Fan, Xiangyang Xue, Yanwei Fu
PDF
Towards Explainable and Unprecedented Accuracy in Matching Challenging Finger Crease Patterns Zhenyu Zhou, Chengdong Dong, Ajay Kumar
PDF
Towards Explicit Geometry-Reflectance Collaboration for Generalized LiDAR Segmentation in Adverse Weather Longyu Yang, Ping Hu, Shangbo Yuan, Lu Zhang, Jun Liu, Hengtao Shen, Xiaofeng Zhu
PDF
Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition Lintong Zhang, Kang Yin, Seong-Whan Lee
PDF
Towards General Visual-Linguistic Face Forgery Detection Ke Sun, Shen Chen, Taiping Yao, Ziyin Zhou, Jiayi Ji, Xiaoshuai Sun, Chia-Wen Lin, Rongrong Ji
PDF
Towards Generalizable Scene Change Detection Jae-Woo Kim, Ue-Hwan Kim
PDF
Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning and Adaptive Prompting Kaouther Messaoud, Matthieu Cord, Alexandre Alahi
PDF
Towards High-Fidelity 3D Talking Avatar with Personalized Dynamic Texture Xuanchen Li, Jianyu Wang, Yuhao Cheng, Yikun Zeng, Xingyu Ren, Wenhan Zhu, Weiming Zhao, Yichao Yan
PDF
Towards Human-Understandable Multi-Dimensional Concept Discovery Arne Grobrügge, Niklas Kühl, Gerhard Satzger, Philipp Spitzer
PDF
Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text Guotao Liang, Baoquan Zhang, Zhiyuan Wen, Junteng Zhao, Yunming Ye, Kola Ye, Yao He
PDF
Towards In-the-Wild 3D Plane Reconstruction from a Single Image Jiachen Liu, Rui Yu, Sili Chen, Sharon X. Huang, Hengkai Guo
PDF
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method Xinshuai Song, Weixing Chen, Yang Liu, Weikai Chen, Guanbin Li, Liang Lin
PDF
Towards Lossless Implicit Neural Representation via Bit Plane Decomposition Woo Kyoung Han, Byeonghun Lee, Hyunmin Cho, Sunghoon Im, Kyong Hwan Jin
PDF
Towards Million-Scale Adversarial Robustness Evaluation with Stronger Individual Attacks Yong Xie, Weijie Zheng, Hanxun Huang, Guangnan Ye, Xingjun Ma
PDF
Towards More General Video-Based Deepfake Detection Through Facial Component Guided Adaptation for Foundation Model Yue-Hua Han, Tai-Ming Huang, Kai-Lung Hua, Jun-Cheng Chen
PDF
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark Hao Guo, Xugong Qin, Jun Jie Ou Yang, Peng Zhang, Gangyan Zeng, Yubo Li, Hailun Lin
PDF
Towards Open-Vocabulary Audio-Visual Event Localization Jinxing Zhou, Dan Guo, Ruohao Guo, Yuxin Mao, Jingjing Hu, Yiran Zhong, Xiaojun Chang, Meng Wang
PDF
Towards Optimizing Large-Scale Multi-Graph Matching in Bioimaging Max Kahl, Sebastian Stricker, Lisa Hutschenreiter, Florian Bernard, Carsten Rother, Bogdan Savchynskyy
PDF
Towards Practical Real-Time Neural Video Compression Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li, Yan Lu
PDF
Towards Precise Embodied Dialogue Localization via Causality Guided Diffusion Haoyu Wang, Le Wang, Sanping Zhou, Jingyi Tian, Zheng Qin, Yabing Wang, Gang Hua, Wei Tang
PDF
Towards Precise Scaling Laws for Video Diffusion Transformers Yuanyang Yin, Yaqi Zhao, Mingwu Zheng, Ke Lin, Jiarong Ou, Rui Chen, Victor Shea-Jay Huang, Jiahao Wang, Xin Tao, Pengfei Wan, Di Zhang, Baoqun Yin, Wentao Zhang, Kun Gai
PDF
Towards RAW Object Detection in Diverse Conditions Zhong-Yu Li, Xin Jin, Bo-Yuan Sun, Chun-Le Guo, Ming-Ming Cheng
PDF
Towards Realistic Example-Based Modeling via 3D Gaussian Stitching Xinyu Gao, Ziyi Yang, Bingchen Gong, Xiaoguang Han, Sipeng Yang, Xiaogang Jin
PDF
Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and a Novel Method Pan Yin, Kaiyu Li, Xiangyong Cao, Jing Yao, Lei Liu, Xueru Bai, Feng Zhou, Deyu Meng
PDF
Towards Scalable Human-Aligned Benchmark for Text-Guided Image Editing Suho Ryu, Kihyun Kim, Eugene Baek, Dongsoo Shin, Joonseok Lee
PDF
Towards Smart Point-and-Shoot Photography Jiawan Li, Fei Zhou, Zhipeng Zhong, Jiongzhi Lin, Guoping Qiu
PDF
Towards Source-Free Machine Unlearning Sk Miraj Ahmed, Umit Yigit Basaran, Dripta S. Raychaudhuri, Arindam Dutta, Rohit Kundu, Fahim Faisal Niloy, Basak Guler, Amit K. Roy-Chowdhury
PDF
Towards Stable and Storage-Efficient Dataset Distillation: Matching Convexified Trajectory Wenliang Zhong, Haoyu Tang, Qinghai Zheng, Mingzhu Xu, Yupeng Hu, Weili Guan
PDF
Towards Training-Free Anomaly Detection with Vision and Language Foundation Models Jinjin Zhang, Guodong Wang, Yizhou Jin, Di Huang
PDF
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance Shulei Wang, Wang Lin, Hai Huang, Hanting Wang, Sihang Cai, WenKang Han, Tao Jin, Jingyuan Chen, Jiacheng Sun, Jieming Zhu, Zhou Zhao
PDF
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation Rohith Peddi, Saurabh Saurabh, Ayush Abhay Shrivastava, Parag Singla, Vibhav Gogate
PDF
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation Gianni Franchi, Nacim Belkhir, Dat Nguyen Trong, Guoxuan Xia, Andrea Pilzer
PDF
Towards Understanding How Knowledge Evolves in Large Vision-Language Models Sudong Wang, Yunjian Zhang, Yao Zhu, Jianing Li, Zizhe Wang, Yanwei Liu, Xiangyang Ji
PDF
Towards Universal AI-Generated Image Detection by Variational Information Bottleneck Network Haifeng Zhang, Qinghui He, Xiuli Bi, Weisheng Li, Bo Liu, Bin Xiao
PDF
Towards Universal Dataset Distillation via Task-Driven Diffusion Ding Qi, Jian Li, Junyao Gao, Shuguang Dou, Ying Tai, Jianlong Hu, Bo Zhao, Yabiao Wang, Chengjie Wang, Cairong Zhao
PDF
Towards Universal Soccer Video Understanding Jiayuan Rao, Haoning Wu, Hao Jiang, Ya Zhang, Yanfeng Wang, Weidi Xie
PDF
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection Wenqiao Li, Yao Gu, Xintao Chen, Xiaohao Xu, Ming Hu, Xiaonan Huang, Yingna Wu
PDF
Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models Jiacong Xu, Shao-Yuan Lo, Bardia Safaei, Vishal M. Patel, Isht Dwivedi
PDF
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning Jiange Yang, Haoyi Zhu, Yating Wang, Gangshan Wu, Tong He, Limin Wang
PDF
Track Any Anomalous Object:A Granular Video Anomaly Detection Pipeline Yuzhi Huang, Chenxin Li, Haitao Zhang, Zixu Lin, Yunlong Lin, Hengyu Liu, Wuyang Li, Xinyu Liu, Jiechao Gao, Yue Huang, Xinghao Ding, Yixuan Yuan
PDF
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation Hyeonho Jeong, Chun-Hao P. Huang, Jong Chul Ye, Niloy J. Mitra, Duygu Ceylan
PDF
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better Zihang Lai, Andrea Vedaldi
PDF
TraF-Align: Trajectory-Aware Feature Alignment for Asynchronous Multi-Agent Perception Zhiying Song, Lei Yang, Fuxi Wen, Jun Li
PDF
Training Data Provenance Verification: Did Your Model Use Synthetic Data from My Generative Model for Training? Yuechen Xie, Jie Song, Huiqiong Wang, Mingli Song
PDF
Training-Free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis Zixuan Wang, Duo Peng, Feng Chen, Yuwei Yang, Yinjie Lei
PDF
Training-Free Neural Architecture Search Through Variance of Knowledge of Deep Network Weights Ondrej Tybl, Lukas Neumann
PDF
Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM Yizhou Huang, Yihua Cheng, Kezhi Wang
PDF
Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene Tai-Yu Pan, Sooyoung Jeon, Mengdi Fan, Jinsu Yoo, Zhenyang Feng, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao
PDF
Transformers Without Normalization Jiachen Zhu, Xinlei Chen, Kaiming He, Yann LeCun, Zhuang Liu
PDF
TransPixeler: Advancing Text-to-Video Generation with Transparency Luozhou Wang, Yijun Li, Zhifei Chen, Jui-Hsien Wang, Zhifei Zhang, He Zhang, Zhe Lin, Ying-Cong Chen
PDF
Traversing Distortion-Perception Tradeoff Using a Single Score-Based Generative Model Yuhan Wang, Suzhi Bi, Ying-Jun Angela Zhang, Xiaojun Yuan
PDF
TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing Stefan Lionar, Jiabin Liang, Gim Hee Lee
PDF
Tripartite Weight-Space Ensemble for Few-Shot Class-Incremental Learning Juntae Lee, Munawar Hayat, Sungrack Yun
PDF
TriTex: Learning Texture from a Single Mesh via Triplane Semantic Features Dana Cohen-Bar, Daniel Cohen-Or, Gal Chechik, Yoni Kasten
PDF
TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation Abduljalil Radman, Jorma Laaksonen
PDF
TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution Linwei Dong, Qingnan Fan, Yihong Guo, Zhonghao Wang, Qi Zhang, Jinwei Chen, Yawei Luo, Changqing Zou
PDF
TSP-Mamba: The Travelling Salesman Problem Meets Mamba for Image Super-Resolution and Beyond Kun Zhou, Xinyu Lin, Jiangbo Lu
PDF
Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks Tiago Novello, Diana Aldana, Andre Araujo, Luiz Velho
PDF
Turbo3D: Ultra-Fast Text-to-3D Generation Hanzhe Hu, Tianwei Yin, Fujun Luan, Yiwei Hu, Hao Tan, Zexiang Xu, Sai Bi, Shubham Tulsiani, Kai Zhang
PDF
TurboFill: Adapting Few-Step Text-to-Image Model for Fast Image Inpainting Liangbin Xie, Daniil Pakhomov, Zhonghao Wang, Zongze Wu, Ziyan Chen, Yuqian Zhou, Haitian Zheng, Zhifei Zhang, Zhe Lin, Jiantao Zhou, Chao Dong
PDF
Twinner: Shining Light on Digital Twins in a Few Snaps Jesus Zarzar, Tom Monnier, Roman Shapovalov, Andrea Vedaldi, David Novotny
PDF
Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation Yu Qi, Yuanchen Ju, Tianming Wei, Chi Chu, Lawson L.S. Wong, Huazhe Xu
PDF
Two Is Better than One: Efficient Ensemble Defense for Robust and Compact Models Yoojin Jung, Byung Cheol Song
PDF
Type-R: Automatically Retouching Typos for Text-to-Image Generation Wataru Shimoda, Naoto Inoue, Daichi Haraguchi, Hayato Mitani, Seiichi Uchida, Kota Yamaguchi
PDF
U-Know-DiffPAN: An Uncertainty-Aware Knowledge Distillation Diffusion Framework with Details Enhancement for PAN-Sharpening Sungpyo Kim, Jeonghyeok Do, Jaehyup Lee, Munchurl Kim
PDF
UA-Pose: Uncertainty-Aware 6d Object Pose Estimation and Online Object Completion with Partial References Ming-Feng Li, Xin Yang, Fu-En Wang, Hritam Basak, Yuyin Sun, Shreekant Gayaka, Min Sun, Cheng-Hao Kuo
PDF
UCM-VeID V2: A Richer Dataset and a Pre-Training Method for UAV Cross-Modality Vehicle Re-Identification Xingyue Liu, Jiahao Qi, Chen Chen, KangCheng Bin, Ping Zhong
PDF
UCOD-DPL: Unsupervised Camouflaged Object Detection via Dynamic Pseudo-Label Learning Weiqi Yan, Lvhai Chen, Huaijia Kou, Shengchuan Zhang, Yan Zhang, Liujuan Cao
PDF
UHD-Processer: Unified UHD Image Restoration with Progressive Frequency Learning and Degradation-Aware Prompts Yidi Liu, Dong Li, Xueyang Fu, Xin Lu, Jie Huang, Zheng-Jun Zha
PDF
UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models Yuning Han, Bingyin Zhao, Rui Chu, Feng Luo, Biplab Sikdar, Yingjie Lao
PDF
UltraFusion: Ultra High Dynamic Imaging Using Exposure Fusion Zixuan Chen, Yujin Wang, Xin Cai, Zhiyuan You, Zheming Lu, Fan Zhang, Shi Guo, Tianfan Xue
PDF
UMFN: Unified Multi-Domain Face Normalization for Joint Cross-Domain Prototype Learning and Heterogeneous Face Recognition Meng Pang, Wenjun Zhang, Nanrun Zhou, Shengbo Chen, Hong Rao
PDF
UMotion: Uncertainty-Driven Human Motion Estimation from Inertial and Ultra-Wideband Units Huakun Liu, Hiroki Ota, Xin Wei, Yutaro Hirao, Monica Perusquia-Hernandez, Hideaki Uchiyama, Kiyoshi Kiyokawa
PDF
Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing Yanjun Li, Zhaoyang Li, Honghui Chen, Lizhi Xu
PDF
Unbiasing Through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks Nina Shvetsova, Arsha Nagrani, Bernt Schiele, Hilde Kuehne, Christian Rupprecht
PDF
Unboxed: Geometrically and Temporally Consistent Video Outpainting Zhongrui Yu, Martina Megaro-Boldini, Robert W. Sumner, Abdelaziz Djelouah
PDF
Uncertain Multimodal Intention and Emotion Understanding in the Wild Qu Yang, Qinghongya Shi, Tongxin Wang, Mang Ye
PDF
Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection Jiangyi Wang, Na Zhao
PDF
Uncertainty Weighted Gradients for Model Calibration Jinxu Lin, Linwei Tao, Minjing Dong, Chang Xu
PDF
Uncertainty-Guided Perturbation for Image Super-Resolution Diffusion Model Leheng Zhang, Weiyi You, Kexuan Shi, Shuhang Gu
PDF
Uncertainty-Instructed Structure Injection for Generalizable HD mAP Construction Xiaolu Liu, Ruizi Yang, Song Wang, Wentong Li, Junbo Chen, Jianke Zhu
PDF
UnCommon Objects in 3D Xingchen Liu, Piyush Tayal, Jianyuan Wang, Jesus Zarzar, Tom Monnier, Konstantinos Tertikas, Jiali Duan, Antoine Toisoul, Jason Y. Zhang, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny
PDF
Understanding Fine-Tuning CLIP for Open-Vocabulary Semantic Segmentation in Hyperbolic Space Zelin Peng, Zhengqin Xu, Zhilin Zeng, Changsong Wen, Yu Huang, Menglin Yang, Feilong Tang, Wei Shen
PDF
Understanding Multi-Layered Transmission Matrices Anat Levin, Marina Alterman
PDF
Understanding Multi-Task Activities from Single-Task Videos Yuhan Shen, Ehsan Elhamifar
PDF
UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning Long Zhou, Fereshteh Shakeri, Aymen Sadraoui, Mounir Kaaniche, Jean-Christophe Pesquet, Ismail Ben Ayed
PDF
Uni-Renderer: Unifying Rendering and Inverse Rendering via Dual Stream Diffusion Zhifei Chen, Tianshuo Xu, Wenhang Ge, Leyi Wu, Dongyu Yan, Jing He, Luozhou Wang, Lu Zeng, Shunsi Zhang, Ying-Cong Chen
PDF
Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video David Yifan Yao, Albert J. Zhai, Shenlong Wang
PDF
UNIALIGN: Scaling Multimodal Alignment Within One Unified Model Bo Zhou, Liulei Li, Yujia Wang, Huafeng Liu, Yazhou Yao, Wenguan Wang
PDF
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming Hao Lin, Ke Wu, Jie Li, Jun Li, Wu-Jun Li
PDF
UNIC-Adapter: Unified Image-Instruction Adapter with Multi-Modal Transformer for Image Generation Lunhao Duan, Shanshan Zhao, Wenjun Yan, Yinglun Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Mingming Gong, Gui-Song Xia
PDF
UNICL-SAM: Uncertainty-Driven In-Context Segmentation with Part Prototype Discovery Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Tao Gong, Bin Liu, Jing Han, Wenbin Tu, Shengwei Xu, Nenghai Yu
PDF
Unified Dense Prediction of Video Diffusion Lehan Yang, Lu Qi, Xiangtai Li, Sheng Li, Varun Jampani, Ming-Hsuan Yang
PDF
Unified Medical Lesion Segmentation via Self-Referring Indicator Shijie Chang, Xiaoqi Zhao, Lihe Zhang, Tiancheng Wang
PDF
Unified Reconstruction of Static and Dynamic Scenes from Events Qiyao Gao, Peiqi Duan, Hanyue Lou, Minggui Teng, Ziqi Cai, Xu Chen, Boxin Shi
PDF
Unified Uncertainty-Aware Diffusion for Multi-Agent Trajectory Modeling Guillem Capellera, Antonio Rubio, Luis Ferraz, Antonio Agudo
PDF
UniGoal: Towards Universal Zero-Shot Goal-Oriented Navigation Hang Yin, Xiuwei Xu, Linqing Zhao, Ziwei Wang, Jie Zhou, Jiwen Lu
PDF
UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping Wenbo Wang, Fangyun Wei, Lei Zhou, Xi Chen, Lin Luo, Xiaohan Yi, Yizhong Zhang, Yaobo Liang, Chang Xu, Yan Lu, Jiaolong Yang, Baining Guo
PDF
UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation Yinqiao Wang, Hao Xu, Pheng-Ann Heng, Chi-Wing Fu
PDF
UniK3D: Universal Camera Monocular 3D Estimation Luigi Piccinelli, Christos Sakaridis, Mattia Segu, Yung-Hsu Yang, Siyuan Li, Wim Abbeloos, Luc Van Gool
PDF
UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-Based 3D Object Detection Xin Jin, Haisheng Su, Kai Liu, Cong Ma, Wei Wu, Fei Hui, Junchi Yan
PDF
UniNet: A Contrastive Learning-Guided Unified Framework with Feature Selection for Anomaly Detection Shun Wei, Jielin Jiang, Xiaolong Xu
PDF
UniPhy: Learning a Unified Constitutive Model for Inverse Physics Simulation Himangi Mittal, Peiye Zhuang, Hsin-Ying Lee, Shubham Tulsiani
PDF
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing Yiheng Li, Ruibing Hou, Hong Chang, Shiguang Shan, Xilin Chen
PDF
UniPre3D: Unified Pre-Training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting Ziyi Wang, Yanran Zhang, Jie Zhou, Jiwen Lu
PDF
UniReal: Universal Image Generation and Editing via Learning Real-World Dynamics Xi Chen, Zhifei Zhang, He Zhang, Yuqian Zhou, Soo Ye Kim, Qing Liu, Yijun Li, Jianming Zhang, Nanxuan Zhao, Yilin Wang, Hui Ding, Zhe Lin, Hengshuang Zhao
PDF
UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu, Yuan-Chun Chiang, Sy-Yen Kuo, Ming-Hsuan Yang
PDF
UniScene: Unified Occupancy-Centric Driving Scene Generation Bohan Li, Jiazhe Guo, Hongsi Liu, Yingshuang Zou, Yikang Ding, Xiwu Chen, Hu Zhu, Feiyang Tan, Chi Zhang, Tiancai Wang, Shuchang Zhou, Li Zhang, Xiaojuan Qi, Hao Zhao, Mu Yang, Wenjun Zeng, Xin Jin
PDF
UniSTD: Towards Unified Spatio-Temporal Learning Across Diverse Disciplines Chen Tang, Xinzhu Ma, Encheng Su, Xiufeng Song, Xiaohong Liu, Wei-Hong Li, Lei Bai, Wanli Ouyang, Xiangyu Yue
PDF
Unity in Diversity: Video Editing via Gradient-Latent Purification Junyu Gao, Kunlin Yang, Xuan Yao, Yufan Hu
PDF
UniVAD: A Training-Free Unified Model for Few-Shot Visual Anomaly Detection Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Ming Tang, Jinqiao Wang
PDF
Universal Actions for Enhanced Embodied Foundation Models Jinliang Zheng, Jianxiong Li, Dongxiu Liu, Yinan Zheng, Zhihao Wang, Zhonghong Ou, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Xianyuan Zhan
PDF
Universal Domain Adaptation for Semantic Segmentation Seun-An Choe, Keon-Hee Park, Jinwoo Choi, Gyeong-Moon Park
PDF
Universal Scene Graph Generation Shengqiong Wu, Hao Fei, Tat-seng Chua
PDF
Unlearning Through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter Zhengyi Zhong, Weidong Bao, Ji Wang, Shuai Zhang, Jingxuan Zhou, Lingjuan Lyu, Wei Yang Bryan Lim
PDF
Unleashing In-Context Learning of Autoregressive Models for Few-Shot Image Manipulation Bolin Lai, Felix Juefei-Xu, Miao Liu, Xiaoliang Dai, Nikhil Mehta, Chenguang Zhu, Zeyi Huang, James M. Rehg, Sangmin Lee, Ning Zhang, Tong Xiao
PDF
Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation Yiheng Li, Yang Yang, Zichang Tan, Huan Liu, Weihua Chen, Xu Zhou, Zhen Lei
PDF
Unleashing the Potential of Multi-Modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation Zhuoman Liu, Weicai Ye, Yan Luximon, Pengfei Wan, Di Zhang
PDF
Unlocking Generalization Power in LiDAR Point Cloud Registration Zhenxuan Zeng, Qiao Wu, Xiyu Zhang, Lin Yuanbo Wu, Pei An, Jiaqi Yang, Ji Wang, Peng Wang
PDF
Unlocking the Potential of Unlabeled Data in Semi-Supervised Domain Generalization Dongkwan Lee, Kyomin Hwang, Nojun Kwak
PDF
UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image Xingyu Liu, Gu Wang, Ruida Zhang, Chenyangguang Zhang, Federico Tombari, Xiangyang Ji
PDF
Unraveling Normal Anatomy via Fluid-Driven Anomaly Randomization Peirong Liu, Ana Lawry Aguila, Juan E. Iglesias
PDF
Unseen Visual Anomaly Generation Han Sun, Yunkang Cao, Hao Dong, Olga Fink
PDF
Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling Haopeng Sun, Yingwei Zhang, Lumin Xu, Sheng Jin, Ping Luo, Chen Qian, Wentao Liu, Yiqiang Chen
PDF
Unsupervised Discovery of Facial Landmarks and Head Pose Satyajit Tourani, Siddharth Tourani, Arif Mahmood, Muhammad Haris Khan
PDF
Unsupervised Foundation Model-Agnostic Slide-Level Representation Learning Tim Lenz, Peter Neidlinger, Marta Ligero, Georg Wölflein, Marko van Treeck, Jakob N. Kather
PDF
Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing Pengcheng Xu, Boyuan Jiang, Xiaobin Hu, Donghao Luo, Qingdong He, Jiangning Zhang, Chengjie Wang, Yunsheng Wu, Charles Ling, Boyu Wang
PDF
Unveiling Differences in Generative Models: A Scalable Differential Clustering Approach Jingwei Zhang, Mohammad Jalali, Cheuk Ting Li, Farzan Farnia
PDF
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly Yexin Liu, Zhengyang Liang, Yueze Wang, Xianfeng Wu, Feilong Tang, Muyang He, Jian Li, Zheng Liu, Harry Yang, Sernam Lim, Bo Zhao
PDF
Unveiling the Mist over 3D Vision-Language Understanding: Object-Centric Evaluation with Chain-of-Analysis Jiangyong Huang, Baoxiong Jia, Yan Wang, Ziyu Zhu, Xiongkun Linghu, Qing Li, Song-Chun Zhu, Siyuan Huang
PDF
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach Jing Bi, Junjia Guo, Yunlong Tang, Lianggong Bruce Wen, Zhang Liu, Bingjie Wang, Chenliang Xu
PDF
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation Qihui Zhang, Munan Ning, Zheyuan Liu, Yue Huang, Shuo Yang, Yanbo Wang, Jiayi Ye, Xiao Chen, Yibing Song, Li Yuan
PDF
UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation Yichong Lu, Yichi Cai, Shangzhan Zhang, Hongyu Zhou, Haoji Hu, Huimin Yu, Andreas Geiger, Yiyi Liao
PDF
URWKV: Unified RWKV Model with Multi-State Perspective for Low-Light Image Restoration Rui Xu, Yuzhen Niu, Yuezhou Li, Huangbiao Xu, Wenxi Liu, Yuzhong Chen
PDF
Using Diffusion Priors for Video Amodal Segmentation Kaihua Chen, Deva Ramanan, Tarasha Khurana
PDF
Using Powerful Prior Knowledge of Diffusion Model in Deep Unfolding Networks for Image Compressive Sensing Chen Liao, Yan Shen, Dan Li, Zhongli Wang
PDF
USP-Gaussian: Unifying Spike-Based Image Reconstruction, Pose Correction and Gaussian Splatting Kang Chen, Jiyuan Zhang, Zecheng Hao, Yajing Zheng, Tiejun Huang, Zhaofei Yu
PDF
UVGS: Reimagining Unstructured 3D Gaussian Splatting Using UV Mapping Aashish Rai, Dilin Wang, Mihir Jain, Nikolaos Sarafianos, Kefan Chen, Srinath Sridhar, Aayush Prakash
PDF
UWAV: Uncertainty-Weighted Weakly-Supervised Audio-Visual Video Parsing Yung-Hsuan Lai, Janek Ebbers, Yu-Chiang Frank Wang, François Germain, Michael Jeffrey Jones, Moitreya Chatterjee
PDF
V-CLR: View-Consistent Learning for Open-World Instance Segmentation Chang-Bin Zhang, Jinhong Ni, Yujie Zhong, Kai Han
PDF
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents Zhengrong Yue, Shaobin Zhuang, Kunchang Li, Yanbo Ding, Yali Wang
PDF
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts Adnen Abdessaied, Anna Rohrbach, Marcus Rohrbach, Andreas Bulling
PDF
V2V3D: View-to-View Denoised 3D Reconstruction for Light Field Microscopy Jiayin Zhao, Zhenqi Fu, Tao Yu, Hui Qiao
PDF
V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection Xun Huang, Jinlong Wang, Qiming Xia, Siheng Chen, Bisheng Yang, Xin Li, Cheng Wang, Chenglu Wen
PDF
Variance-Based Membership Inference Attacks Against Large-Scale Image Captioning Models Daniel Samira, Edan Habler, Yuval Elovici, Asaf Shabtai
PDF
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification Xianwei Zhuang, Zhihong Zhu, Yuxin Xie, Liming Liang, Yuexian Zou
PDF
VasTSD: Learning 3D Vascular Tree-State Space Diffusion Model for Angiography Synthesis Zhifeng Wang, Renjiao Yi, Xin Wen, Chenyang Zhu, Kai Xu
PDF
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents Ryota Tanaka, Taichi Iki, Taku Hasegawa, Kyosuke Nishida, Kuniko Saito, Jun Suzuki
PDF
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment Darshana Saravanan, Varun Gupta, Darshan Singh, Zeeshan Khan, Vineet Gandhi, Makarand Tapaswi
PDF
VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models Muchao Ye, Weiyang Liu, Pan He
PDF
VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness SeungJu Cha, Kwanyoung Lee, Ye-Chan Kim, Hyunwoo Oh, Dong-Jin Kim
PDF
vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation Bastian Wittmann, Yannick Wattenberg, Tamaz Amiranashvili, Suprosanna Shit, Bjoern Menze
PDF
VEU-Bench: Towards Comprehensive Understanding of Video Editing Bozheng Li, Yongliang Wu, Yi Lu, Jiashuo Yu, Licheng Tang, Jiawang Cao, Wenqing Zhu, Yuyang Sun, Jay Wu, Wenbo Zhu
PDF
VGGT: Visual Geometry Grounded Transformer Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, David Novotny
PDF
VI^3NR: Variance Informed Initialization for Implicit Neural Representations Chamin Hewa Koneputugodage, Yizhak Ben-Shabat, Sameera Ramasinghe, Stephen Gould
PDF
ViCaS: A Dataset for Combining Holistic and Pixel-Level Video Understanding Using Captions with Grounded Segmentation Ali Athar, Xueqing Deng, Liang-Chieh Chen
PDF
Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior Chen Guo, Junxuan Li, Yash Kant, Yaser Sheikh, Shunsuke Saito, Chen Cao
PDF
Vid2Sim: Generalizable, Video-Based Reconstruction of Appearance, Geometry and Physics for Mesh-Free Simulation Chuhao Chen, Zhiyang Dou, Chen Wang, Yiming Huang, Anjun Chen, Qiao Feng, Jiatao Gu, Lingjie Liu
PDF
Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation Ziyang Xie, Zhizheng Liu, Zhenghao Peng, Wayne Wu, Bolei Zhou
PDF
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation Hanzhi Chen, Boyang Sun, Anran Zhang, Marc Pollefeys, Stefan Leutenegger
PDF
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos? Yunlong Tang, Junjia Guo, Hang Hua, Susan Liang, Mingqian Feng, Xinyang Li, Rui Mao, Chao Huang, Jing Bi, Zeliang Zhang, Pooyan Fazli, Chenliang Xu
PDF
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos Sili Chen, Hengkai Guo, Shengnan Zhu, Feihu Zhang, Zilong Huang, Jiashi Feng, Bingyi Kang
PDF
Video Depth Without Video Models Bingxin Ke, Dominik Narnhofer, Shengyu Huang, Lei Ke, Torben Peters, Katerina Fragkiadaki, Anton Obukhov, Konrad Schindler
PDF
Video Language Model Pretraining with Spatio-Temporal Masking Yue Wu, Zhaobo Qi, Junshu Sun, Yaowei Wang, Qingming Huang, Shuhui Wang
PDF
Video Motion Transfer with Diffusion Transformers Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov, Philip Torr, Fabio Pizzati
PDF
Video Summarization with Large Language Models Min Jung Lee, Dayoung Gong, Minsu Cho
PDF
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding Duo Zheng, Shijia Huang, Liwei Wang
PDF
Video-Bench: Human-Aligned Video Generation Benchmark Hui Han, Siyuan Li, Jiaqi Chen, Yiwen Yuan, Yuling Wu, Yufan Deng, Chak Tou Leong, Hanwen Du, Junchen Fu, Youhua Li, Jie Zhang, Chi Zhang, Li-jia Li, Yongxin Ni
PDF
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval Arun Reddy, Alexander Martin, Eugene Yang, Andrew Yates, Kate Sanders, Kenton Murray, Reno Kriz, Celso M. de Melo, Benjamin Van Durme, Rama Chellappa
PDF
Video-Guided Foley Sound Generation with Multimodal Controls Ziyang Chen, Prem Seetharaman, Bryan Russell, Oriol Nieto, David Bourgin, Andrew Owens, Justin Salamon
PDF
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-Modal LLMs in Video Analysis Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Caifeng Shan, Ran He, Xing Sun
PDF
Video-Panda: Parameter-Efficient Alignment for Encoder-Free Video-Language Models Jinhui Yi, Syed Talal Wasim, Yanan Luo, Muzammal Naseer, Juergen Gall
PDF
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding Yan Shu, Zheng Liu, Peitian Zhang, Minghao Qin, Junjie Zhou, Zhengyang Liang, Tiejun Huang, Bo Zhao
PDF
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis Through User Simulation Ziyang Luo, Haoning Wu, Dongxu Li, Jing Ma, Mohan Kankanhalli, Junnan Li
PDF
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models Dahun Kim, Aj Piergiovanni, Ganesh Mallya, Anelia Angelova
PDF
VideoDirector: Precise Video Editing via Text-to-Video Models Yukun Wang, Longguang Wang, Zhiyuan Ma, Qibin Hu, Kai Xu, Yulan Guo
PDF
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation Runtao Liu, Haoyu Wu, Ziqiang Zheng, Chen Wei, Yingqing He, Renjie Pi, Qifeng Chen
PDF
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection Songhao Han, Wei Huang, Hairong Shi, Le Zhuo, Xiu Su, Shifeng Zhang, Xu Zhou, Xiaojuan Qi, Yue Liao, Si Liu
PDF
VideoGEM: Training-Free Action Grounding in Videos Felix Vogel, Walid Bousselham, Anna Kukleva, Nina Shvetsova, Hilde Kuehne
PDF
VideoGigaGAN: Towards Detail-Rich Video Super-Resolution Yiran Xu, Taesung Park, Richard Zhang, Yang Zhou, Eli Shechtman, Feng Liu, Jia-Bin Huang, Difan Liu
PDF
VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Shehan Munasinghe, Hanan Gani, Wenqi Zhu, Jiale Cao, Eric Xing, Fahad Shahbaz Khan, Salman Khan
PDF
VideoGuide: Improving Video Diffusion Models Without Training Through a Teacher's Guide Dohun Lee, Bryan Sangwoo Kim, Geon Yeong Park, Jong Chul Ye
PDF
VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors Juil Koo, Paul Guerrero, Chun-Hao P. Huang, Duygu Ceylan, Minhyuk Sung
PDF
VideoICL: Confidence-Based Iterative In-Context Learning for Out-of-Distribution Video Understanding Kangsan Kim, Geon Park, Youngwan Lee, Woongyeong Yeo, Sung Ju Hwang
PDF
VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models Chi-Pin Huang, Yen-Siang Wu, Hung-Kai Chung, Kai-Po Chang, Fu-En Yang, Yu-Chiang Frank Wang
PDF
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, Boqiang Zhang, Long Li, Xin Li, Deli Zhao, Wenqiao Zhang, Yueting Zhuang, Jianke Zhu, Lidong Bing
PDF
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Hanyang Wang, Fangfu Liu, Jiawei Chi, Yueqi Duan
PDF
VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing Juan Luis Gonzalez, Xu Yao, Alex Whelan, Kyle Olszewski, Hyeongwoo Kim, Pablo Garrido
PDF
VideoTree: Adaptive Tree-Based Video Representation for LLM Reasoning on Long Videos Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal
PDF
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Zhongwei Ren, Yunchao Wei, Xun Guo, Yao Zhao, Bingyi Kang, Jiashi Feng, Xiaojie Jin
PDF
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding Chaoyu Li, Eun Woo Im, Pooyan Fazli
PDF
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling Zeyue Tian, Zhaoyang Liu, Ruibin Yuan, Jiahao Pan, Qifeng Liu, Xu Tan, Qifeng Chen, Wei Xue, Yike Guo
PDF
VidSeg: Training-Free Video Semantic Segmentation Based on Diffusion Models Qian Wang, Abdelrahman Eldesokey, Mohit Mendiratta, Fangneng Zhan, Adam Kortylewski, Christian Theobalt, Peter Wonka
PDF
VidTwin: Video VAE with Decoupled Structure and Dynamics Yuchi Wang, Junliang Guo, Xinyi Xie, Tianyu He, Xu Sun, Jiang Bian
PDF
Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-Invariant Representation Learning Mi Luo, Zihui Xue, Alex Dimakis, Kristen Grauman
PDF
ViiNeuS: Volumetric Initialization for Implicit Neural Surface Reconstruction of Urban Scenes with Limited Image Overlap Hala Djeghim, Nathan Piasco, Moussab Bennehar, Luis Roldao, Dzmitry Tsishkou, Désiré Sidibé
PDF
ViKIENet: Towards Efficient 3D Object Detection with Virtual Key Instance Enhanced Network Zhuochen Yu, Bijie Qiu, Andy W. H. Khong
PDF
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge Vishwesh Nath, Wenqi Li, Dong Yang, Andriy Myronenko, Mingxin Zheng, Yao Lu, Zhijian Liu, Hongxu Yin, Yee Man Law, Yucheng Tang, Pengfei Guo, Can Zhao, Ziyue Xu, Yufan He, Stephanie Harmon, Benjamin Simon, Greg Heinrich, Stephen Aylward, Marc Edgar, Michael Zephyr, Pavlo Molchanov, Baris Turkbey, Holger Roth, Daguang Xu
PDF
VinaBench: Benchmark for Faithful and Consistent Visual Narratives Silin Gao, Sheryl Mathew, Li Mi, Sepideh Mamooler, Mengjie Zhao, Hiromi Wakaki, Yuki Mitsufuji, Syrielle Montariol, Antoine Bosselut
PDF
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation Saksham Singh Kushwaha, Yapeng Tian
PDF
VIRES: Video Instance Repainting via Sketch and Text Guided Generation Shuchen Weng, Haojie Zheng, Peixuan Zhang, Yuchen Hong, Han Jiang, Si Li, Boxin Shi
PDF
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning Xueqing Wu, Yuheng Ding, Bingxuan Li, Pan Lu, Da Yin, Kai-Wei Chang, Nanyun Peng
PDF
Vision-Guided Action: Enhancing 3D Human Motion Prediction with Gaze-Informed Affordance in 3D Scenes Ting Yu, Yi Lin, Jun Yu, Zhenyu Lou, Qiongjie Cui
PDF
Vision-Language Embodiment for Monocular Depth Estimation Jinchang Zhang, Guoyu Lu
PDF
Vision-Language Gradient Descent-Driven All-in-One Deep Unfolding Networks Haijin Zeng, Xiangming Wang, Yongyong Chen, Jingyong Su, Jie Liu
PDF
Vision-Language Model IP Protection via Prompt-Based Learning Lianyu Wang, Meng Wang, Huazhu Fu, Daoqiang Zhang
PDF
Vision-Language Models Do Not Understand Negation Kumail Alhamoud, Shaden Alshammari, Yonglong Tian, Guohao Li, Philip H.S. Torr, Yoon Kim, Marzyeh Ghassemi
PDF
VisionArena: 230k Real World User-VLM Conversations with Preference Labels Christopher Chou, Lisa Dunlap, Koki Mashita, Krishna Mandal, Trevor Darrell, Ion Stoica, Joseph E. Gonzalez, Wei-Lin Chiang
PDF
VisionPAD: A Vision-Centric Pre-Training Paradigm for Autonomous Driving Haiming Zhang, Wending Zhou, Yiyao Zhu, Xu Yan, Jiantao Gao, Dongfeng Bai, Yingjie Cai, Bingbing Liu, Shuguang Cui, Zhen Li
PDF
VisionZip: Longer Is Better but Not Necessary in Vision Language Models Senqiao Yang, Yukang Chen, Zhuotao Tian, Chengyao Wang, Jingyao Li, Bei Yu, Jiaya Jia
PDF
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation Weiming Ren, Huan Yang, Jie Min, Cong Wei, Wenhu Chen
PDF
VISTA3D: A Unified Segmentation Foundation Model for 3D Medical Imaging Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu, Wenqi Li
PDF
VISTREAM: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He
PDF
Visual Agentic AI for Spatial Reasoning with a Dynamic API Damiano Marsili, Rohun Agrawal, Yisong Yue, Georgia Gkioxari
PDF
Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning Huajie Jiang, Zhengxian Li, Xiaohan Yu, Yongli Hu, Baocai Yin, Jian Yang, Yuankai Qi
PDF
Visual Consensus Prompting for Co-Salient Object Detection Jie Wang, Nana Yu, Zihao Zhang, Yahong Han
PDF
Visual Lexicon: Rich Image Features in Language Space XuDong Wang, Xingyi Zhou, Alireza Fathi, Trevor Darrell, Cordelia Schmid
PDF
Visual Persona: Foundation Model for Full-Body Human Customization Jisu Nam, Soowon Son, Zhan Xu, Jing Shi, Difan Liu, Feng Liu, Seungryong Kim, Yang Zhou
PDF
Visual Prompting for One-Shot Controllable Video Editing Without Inversion Zhengbo Zhang, Yuxi Zhou, Duo Peng, Joo-Hwee Lim, Zhigang Tu, De Wen Soh, Lin Geng Foo
PDF
Visual Representation Learning Through Causal Intervention for Controllable Image Editing Shanshan Huang, Haoxuan Li, Chunyuan Zheng, Lei Wang, Guorui Liao, Zhili Gong, Huayi Yang, Li Liu
PDF
Visual-Instructed Degradation Diffusion for All-in-One Image Restoration Wenyang Luo, Haina Qin, Zewen Chen, Libin Wang, Dandan Zheng, Yuming Li, Yufan Liu, Bing Li, Weiming Hu
PDF
VITED: Video Temporal Evidence Distillation Yujie Lu, Yale Song, William Wang, Lorenzo Torresani, Tushar Nagarajan
PDF
ViUniT: Visual Unit Tests for More Robust Visual Programming Artemis Panagopoulou, Honglu Zhou, Silvio Savarese, Caiming Xiong, Chris Callison-Burch, Mark Yatskar, Juan Carlos Niebles
PDF
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models Lei Li, Yuancheng Wei, Zhihui Xie, Xuqing Yang, Yifan Song, Peiyi Wang, Chenxin An, Tianyu Liu, Sujian Li, Bill Yuchen Lin, Lingpeng Kong, Qi Liu
PDF
VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks Jinseong Jang, Chunfei Ma, Byeongwon Lee
PDF
VladVA: Discriminative Fine-Tuning of LVLMs Yassine Ouali, Adrian Bulat, Alexandros Xenos, Anestis Zaganidis, Ioannis Maniadis Metaxas, Brais Martinez, Georgios Tzimiropoulos
PDF
VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning Haoran Xu, Peixi Peng, Guang Tan, Yiqian Chang, Luntong Li, Yonghong Tian
PDF
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary Kevin Qinghong Lin, Mike Zheng Shou
PDF
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis Enric Corona, Andrei Zanfir, Eduard Gabriel Bazavan, Nikos Kolotouros, Thiemo Alldieck, Cristian Sminchisescu
PDF
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang, Yong Man Ro, Yueh-Hua Wu
PDF
VoCo-Llama: Towards Vision Compression with Large Language Models Xubing Ye, Yukang Gan, Xiaoke Huang, Yixiao Ge, Yansong Tang
PDF
VODiff: Controlling Object Visibility Order in Text-to-Image Generation Dong Liang, Jinyuan Jia, Yuhao Liu, Zhanghan Ke, Hongbo Fu, Rynson W. H. Lau
PDF
VolFormer: Explore More Comprehensive Cube Interaction for Hyperspectral Image Restoration and Beyond Dabing Yu, Zheng Gao
PDF
Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-Noising and Super-Resolution Zelin Li, Chenwei Wang, Zhaoke Huang, Yiming Ma, Cunming Zhao, Zhongying Zhao, Hong Yan
PDF
Volumetric Surfaces: Representing Fuzzy Geometries with Layered Meshes Stefano Esposito, Anpei Chen, Christian Reiser, Samuel Rota Bulò, Lorenzo Porzi, Katja Schwarz, Christian Richardt, Michael Zollhöfer, Peter Kontschieder, Andreas Geiger
PDF
Volumetrically Consistent 3D Gaussian Rasterization Chinmay Talegaonkar, Yash Belhe, Ravi Ramamoorthi, Nicholas Antipa
PDF
VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow Yancong Lin, Shiming Wang, Liangliang Nan, Julian Kooij, Holger Caesar
PDF
VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction Ziyue Zhu, Shenlong Wang, Jin Xie, Jiang-jiang Liu, Jingdong Wang, Jian Yang
PDF
VSNet: Focusing on the Linguistic Characteristics of Sign Language Yuhao Li, Xinyue Chen, Hongkai Li, Xiaorong Pu, Peng Jin, Yazhou Ren
PDF
VTON 360: High-Fidelity Virtual Try-on from Any Viewing Direction Zijian He, Yuwei Ning, Yipeng Qin, Guangrun Wang, Sibei Yang, Liang Lin, Guanbin Li
PDF
VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding Yujie Liang, Xiaobin Hu, Boyuan Jiang, Donghao Luo, Xu Peng, Kai Wu, Chengming Xu, Wenhui Han, Taisong Jin, Chengjie Wang, Rongrong Ji
PDF
Watermarking One for All: A Robust Watermarking Scheme Against Partial Image Theft Gaozhi Liu, Silu Cao, Zhenxing Qian, Xinpeng Zhang, Sheng Li, Wanli Peng
PDF
Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation Hao Li, Ju Dai, Xin Zhao, Feng Zhou, Junjun Pan, Lei Li
PDF
WAVE: Weight Templates for Adaptive Initialization of Variable-Sized Models Fu Feng, Yucheng Xie, Jing Wang, Xin Geng
PDF
Wavelet and Prototype Augmented Query-Based Transformer for Pixel-Level Surface Defect Detection Feng Yan, Xiaoheng Jiang, Yang Lu, Jiale Cao, Dong Chen, Mingliang Xu
PDF
Weakly Supervised Contrastive Adversarial Training for Learning Robust Features from Semi-Supervised Data Lilin Zhang, Chengpei Wu, Ning Yang
PDF
Weakly Supervised Semantic Segmentation via Progressive Confidence Region Expansion Xiangfeng Xu, Pinyi Zhang, Wenxuan Huang, Yunhang Shen, Haosheng Chen, Jingzhong Lin, Wei Li, Gaoqi He, Jiao Xie, Shaohui Lin
PDF
Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models Quan Zhang, Jinwei Fang, Rui Yuan, Xi Tang, Yuxin Qi, Ke Zhang, Chun Yuan
PDF
WeakMCN: Multi-Task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation Silin Cheng, Yang Liu, Xinwei He, Sebastien Ourselin, Lei Tan, Gen Luo
PDF
WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion Yang Wu, Yun Zhu, Kaihua Zhang, Jianjun Qian, Jin Xie, Jian Yang
PDF
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat Zhipeng Huang, Shaobin Zhuang, Canmiao Fu, Binxin Yang, Ying Zhang, Chong Sun, Zhizheng Zhang, Yali Wang, Chen Li, Zheng-Jun Zha
PDF
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model Zongjian Li, Bin Lin, Yang Ye, Liuhan Chen, Xinhua Cheng, Shenghai Yuan, Li Yuan
PDF
What Makes a Good Dataset for Knowledge Distillation? Logan Frank, Jim Davis
PDF
What's in the Image? a Deep-Dive into the Vision of Vision Language Models Omri Kaduri, Shai Bagon, Tali Dekel
PDF
When Domain Generalization Meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach Vaibhav Rathore, Shubhranil B, Saikat Dutta, Sarthak Mehrotra, Zsolt Kira, Biplab Banerjee
PDF
When the Future Becomes the past: Taming Temporal Correspondence for Self-Supervised Video Representation Learning Yang Liu, Qianqian Xu, Peisong Wen, Siran Dai, Qingming Huang
PDF
Where the Devil Hides: Deepfake Detectors Can No Longer Be Trusted Shuaiwei Yuan, Junyu Dong, Yuezun Li
PDF
Where's the Liability in the Generative Era? Recovery-Based Black-Box Detection of AI-Generated Content Haoyue Bai, Yiyou Sun, Wei Cheng, Haifeng Chen
PDF
Which Viewpoint Shows It Best? Language for Weakly Supervising View Selection in Multi-View Instructional Videos Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah, Reina Pradhan, Kristen Grauman
PDF
WildAvatar: Learning In-the-Wild 3D Avatars from the Web Zihao Huang, Shoukang Hu, Guangcong Wang, Tianqi Liu, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu
PDF
WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments Jianhao Zheng, Zihan Zhu, Valentin Bieri, Marc Pollefeys, Songyou Peng, Iro Armeni
PDF
WiLoR: End-to-End 3D Hand Localization and Reconstruction In-the-Wild Rolandos Alexandros Potamias, Jinglei Zhang, Jiankang Deng, Stefanos Zafeiriou
PDF
WISE: A Framework for Gigapixel Whole-Slide-Image Lossless Compression Yu Mao, Jun Wang, Nan Guan, Chun Jason Xue
PDF
WISH: Weakly Supervised Instance Segmentation Using Heterogeneous Labels Hyeokjun Kweon, Kuk-Jin Yoon
PDF
WISNet: Pseudo Label Generation on Unbalanced and Patch Annotated Waste Images Shifan Zhang, Hongzi Zhu, Yinan He, Minyi Guo, Ziyang Lou, Shan Chang
PDF
Wonderland: Navigating 3D Scenes from a Single Image Hanwen Liang, Junli Cao, Vidit Goel, Guocheng Qian, Sergei Korolev, Demetri Terzopoulos, Konstantinos N. Plataniotis, Sergey Tulyakov, Jian Ren
PDF
WonderWorld: Interactive 3D Scene Generation from a Single Image Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, Jiajun Wu
PDF
Words or Vision: Do Vision-Language Models Have Blind Faith in Text? Ailin Deng, Tri Cao, Zhirui Chen, Bryan Hooi
PDF
World-Consistent Video Diffusion with Explicit 3D Modeling Qihang Zhang, Shuangfei Zhai, Miguel Ángel Bautista Martin, Kevin Miao, Alexander Toshev, Joshua Susskind, Jiatao Gu
PDF
X-Dyna: Expressive Dynamic Human Image Animation Di Chang, Hongyi Xu, You Xie, Yipeng Gao, Zhengfei Kuang, Shengqu Cai, Chenxu Zhang, Guoxian Song, Chao Wang, Yichun Shi, Zeyuan Chen, Shijie Zhou, Linjie Luo, Gordon Wetzstein, Mohammad Soleymani
PDF
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery? Fengxiang Wang, Hongzhen Wang, Zonghao Guo, Di Wang, Yulin Wang, Mingshuo Chen, Qiang Ma, Long Lan, Wenjing Yang, Jing Zhang, Zhiyuan Liu, Maosong Sun
PDF
Yo'Chameleon: Personalized Vision and Language Generation Thao Nguyen, Krishna Kumar Singh, Jing Shi, Trung Bui, Yong Jae Lee, Yuheng Li
PDF
You See It, You Got It: Learning 3D Creation on Pose-Free Videos at Scale Baorui Ma, Huachen Gao, Haoge Deng, Zhengxiong Luo, Tiejun Huang, Lulu Tang, Xinlong Wang
PDF
Your Large Vision-Language Model Only Needs a Few Attention Heads for Visual Grounding Seil Kang, Jinyeong Kim, Junhyeok Kim, Seong Jae Hwang
PDF
Your Scale Factors Are My Weapon: Targeted Bit-Flip Attacks on Vision Transformers via Scale Factor Manipulation Jialai Wang, Yuxiao Wu, Weiye Xu, Yating Huang, Chao Zhang, Zongpeng Li, Mingwei Xu, Zhenkai Liang
PDF
Your ViT Is Secretly an Image Segmentation Model Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, Daan de Geus
PDF
Z-Magic: Zero-Shot Multiple Attributes Guided Image Creator Yingying Deng, Xiangyu He, Fan Tang, Weiming Dong
PDF
Zero-1-to-a: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion Zhenglin Zhou, Fan Ma, Hehe Fan, Tat-Seng Chua
PDF
Zero-Shot 3D Question Answering via Voxel-Based Dynamic Token Compression Hsiang-Wei Huang, Fu-Chen Chen, Wenhao Chai, Che-Chun Su, Lu Xia, Sanghun Jung, Cheng-Yen Yang, Jenq-Neng Hwang, Min Sun, Cheng-Hao Kuo
PDF
Zero-Shot 4D LiDAR Panoptic Segmentation Yushan Zhang, Aljoša Ošep, Laura Leal-Taixé, Tim Meinhardt
PDF
Zero-Shot Blind-Spot Image Denoising via Implicit Neural Sampling Yuhui Quan, Tianxiang Zheng, Zhiyuan Ma, Hui Ji
PDF
Zero-Shot Head Swapping in Real-World Scenarios Taewoong Kang, Sohyun Jeong, Hyojin Jang, Jaegul Choo
PDF
Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond) Tomer Garber, Tom Tirer
PDF
Zero-Shot Monocular Scene Flow Estimation in the Wild Yiqing Liang, Abhishek Badki, Hang Su, James Tompkin, Orazio Gallo
PDF
Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion Vitor Guizilini, Muhammad Zubair Irshad, Dian Chen, Greg Shakhnarovich, Rares Ambrus
PDF
Zero-Shot RGB-D Point Cloud Registration with Pre-Trained Large Vision Model Haobo Jiang, Jin Xie, Jian Yang, Liang Yu, Jianmin Zheng
PDF
Zero-Shot Styled Text Image Generation, but Make It Autoregressive Vittorio Pippi, Fabio Quattrini, Silvia Cascianelli, Alessio Tonioni, Rita Cucchiara
PDF
ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping Shun Iwase, Muhammad Zubair Irshad, Katherine Liu, Vitor Guizilini, Robert Lee, Takuya Ikeda, Ayako Amma, Koichi Nishiwaki, Kris Kitani, Rares Ambrus, Sergey Zakharov
PDF
ZeroVO: Visual Odometry with Minimal Assumptions Lei Lai, Zekai Yin, Eshed Ohn-Bar
PDF
ZoomLDM: Latent Diffusion Model for Multi-Scale Image Generation Srikar Yellapragada, Alexandros Graikos, Kostas Triaridis, Prateek Prasanna, Rajarsi Gupta, Joel Saltz, Dimitris Samaras
PDF