ECCV 2024
2387 papers
∞-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions
Minh-Quan Le, Alexandros Graikos, Srikar Yellapragada, Rajarsi Gupta, Joel Saltz, Dimitris Samaras 3D Congealing: 3D-Aware Image Alignment in the Wild
Yunzhi Zhang, Zizhang Li, Amit Raj, Andreas Engelhardt, Yuanzhen Li, Tingbo Hou, Jiajun Wu, Varun Jampani 3D Gaussian Parametric Head Model
Yuelang Xu, Lizhen Wang, Zerong Zheng, Zhaoqi Su, Yebin Liu 3D Hand Pose Estimation in Everyday Egocentric Images
Aditya Prakash, Ruisen Tu, Matthew Chang, Saurabh Gupta 3D Hand Sequence Recovery from Real Blurry Images and Event Stream
JoonKyu Park, Gyeongsik Moon, Weipeng Xu, Evan Kaseman, Takaaki Shiratori, Kyoung Mu Lee 3D Human Pose Estimation via Non-Causal Retentive Networks
Kaili Zheng, Feixiang Lu, Yihao Lv, Liangjun Zhang, Chenyi Guo, Ji Wu 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Zihao Xiao, Longlong Jing, Shangxuan Wu, Alex Zihao Zhu, Jingwei Ji, Chiyu Max Jiang, Wei-Chih Hung, Thomas Funkhouser, Weicheng Kuo, Anelia Angelova, Yin Zhou, Shiwei Sheng 3D Single-Object Tracking in Point Clouds with High Temporal Variation
Qiao Wu, Kun Sun, Pei An, Mathieu Salzmann, Yanning Zhang, Jiaqi Yang 3D Small Object Detection with Dynamic Spatial Pruning
Zhihao Sun, Ziwei Wang, Hongmin Liu, Jie Zhou, Jiwen Lu, Xiuwei Xu 3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance
Xiaoxu Xu, Yitian Yuan, Jinlong Li, Qiudan Zhang, Zequn Jie, Lin Ma, Hao Tang, Nicu Sebe, Xu Wang 3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-Object Editing
Haoran Li, Long Ma, Haolin Shi, Yanbin Hao, Yong Liao, Lechao Cheng, Peng Yuan Zhou 3DEgo: 3D Editing on the Go!
Umar Khalid, Hasan Iqbal, Azib Farooq, Jing Hua, Chen Chen 3DFG-PIFu: 3D Feature Grids for Human Digitization from Sparse Views
Kennard Yanting Chan, Fayao Liu, Guosheng Lin, Chuan Sheng Foo, Weisi Lin 3DGazeNet: Generalizing Gaze Estimation with Weak Supervision from Synthetic Views
Evangelos Ververas, Polydefkis Gkagkos, Jiankang Deng, Michail C Doukas, Jia Guo, Stefanos Zafeiriou 3R-INN: How to Be Climate Friendly While Consuming/delivering Videos?
Zoubida Ameur, Claire-Helene Demarty, Olivier Le Meur, Daniel Menard 3x2: 3D Object Part Segmentation by 2D Semantic Correspondences
Anh Thai, Weiyao Wang, Hao Tang, Stefan Stojanov, James M Rehg, Matt Feiszli 4D Contrastive Superflows Are Dense 3D Representation Learners
Xiang Xu, Lingdong Kong, Hui Shuai, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Qingshan Liu 4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation
Feng Cheng, Mi Luo, Huiyu Wang, Alex Dimakis, Lorenzo Torresani, Gedas Bertasius, Kristen Grauman 6DGS: 6d Pose Estimation from a Single Image and a 3D Gaussian Splatting Model
Matteo Bortolon, Theodore Tsesmelis, Stuart James, Fabio Poiesi, Alessio Del Bue A Diffusion Model for Simulation Ready Coronary Anatomy with Morpho-Skeletal Control
Karim Kadry, Shreya Gupta, Jonas Sogbadji, Michiel Schaap, Kersten Petersen, Takuya Mizukami, Carlos Collet, Farhad R. Nezami, Elazer R Edelman A Direct Approach to Viewing Graph Solvability
Federica Arrigoni, Andrea Fusiello, Tomas Pajdla A Fair Ranking and New Model for Panoptic Scene Graph Generation
Julian Lorenz, Alexander Pest, Daniel Kienzle, Katja Ludwig, Rainer Lienhart A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
Xiang Liu, Zhaoxiang Liu, Huan Hu, Zezhou Chen, Kohou Wang, Kai Wang, Shiguo Lian A Probability-Guided Sampler for Neural Implicit Surface Rendering
Gonçalo José Dias Pais, Valter André Piedade, Moitreya Chatterjee, Marcus Greiff, Pedro Miraldo A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-Shaped Structures
Tahmina Khanam, Mohammed Bennamoun, Guan Wang, Guanjin Wang, Ferdous Sohel, Farid Boussaid, Anuj Srivastava, Hamid Laga A Rotation-Invariant Texture ViT for Fine-Grained Recognition of Esophageal Cancer Endoscopic Ultrasound Images
Tianyi Liu, Shuaishuai S Zhuang, Jiacheng Nie, Geng Chen, Yusheng Guo, Guangquan Zhou, Jean-Louis Coatrieux, Yang Chen A Semantic Space Is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties
Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu, Bingchen Zhao, Alan Yuille, Yuyin Zhou, Cihang Xie Accelerating Image Generation with Sub-Path Linear Approximation Model
Chen Xu, Tianhui Song, Weixin Feng, Xubin Li, Tiezheng Ge, Bo Zheng, Limin Wang Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman ActionVOS: Actions as Prompts for Video Object Segmentation
Liangyang Ouyang, Ruicong Liu, Yifei Huang, Ryosuke Furuta, Yoichi Sato AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection
Yunkang Cao, Jiangning Zhang, Luca Frittoli, Yuqi Cheng, Weiming Shen, Giacomo Boracchi AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
Adam Pardyl, Michał Wronka, Maciej Wołczyk, Kamil Adamczewski, Tomasz Trzcinski, Bartosz Zieliński AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
Zanlin Ni, Yulin Wang, Renping Zhou, Rui Lu, Jiayi Guo, Jinyi Hu, Zhiyuan Liu, Yuan Yao, Gao Huang Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts
Yanting Yang, Minghao Chen, Qibo Qiu, Jiahao Wu, Wenxiao Wang, Binbin Lin, Ziyu Guan, Xiaofei He Adaptive Annealing for Robust Averaging
Sidhartha Chitturi, Venu Madhav Govindu Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction
Alexander Timans, Christoph-Nikolas Straehle, Kaspar Sakmann, Eric Nalisnick Adaptive Correspondence Scoring for Unsupervised Medical Image Registration
Xiaoran Zhang, John C. Stendahl, Lawrence H. Staib, Albert J. Sinusas, Alex Wong, James S. Duncan Adaptive Human Trajectory Prediction via Latent Corridors
Neerja Thakkar, Karttikeya Mangalam, Andrea Bajcsy, Jitendra Malik Adaptive Multi-Head Contrastive Learning
Lei Wang, Piotr Koniusz, Tom Gedeon, Liang Zheng Adaptive Multi-Modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
Junxiong Lin, Yan Wang, Zeng Tao, Boyang Wang, Qing Zhao, Haoran Wang, Xuan Tong, Xinji Mai, Yuxuan Lin, Wei Song, Jiawen Yu, Shaoqi Yan, Wenqiang Zhang Adaptive Parametric Activation
Konstantinos P Alexandridis, Jiankang Deng, Anh Nguyen, Shan Luo AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale
Keenon Werling, Janelle M Kaneda, Tian Tan, Rishi Agarwal, Six Skov, Tom Van Wouwe, Scott Uhlrich, Scott Delp, Karen Liu, Nicholas A Bianco, Carmichael Ong, Antoine Falisse, Shardul Sapkota, Aidan Jai Chandra, Joshua A Carter, Ezio Preatoni, Benjamin J Fregly, Jennifer Hicks AddMe: Zero-Shot Group-Photo Synthesis by Inserting People into Scenes
Dongxu Yue, Maomao Li, Yunfei Liu, Ailing Zeng, Tianyu Yang, Qin Guo, Yu Li ADMap: Anti-Disturbance Framework for Vectorized HD mAP Construction
Haotian Hu, Fanyi Wang, Yaonong Wang, Laifeng Hu, Jingwei Xu, Zhiwang Zhang Adversarial Diffusion Distillation
Axel Sauer, Dominik Lorenz, Andreas Blattmann, Robin Rombach Adversarial Prompt Tuning for Vision-Language Models
Jiaming Zhang, Xingjun Ma, Xin Wang, Lingyu Qiu, Jiaqi Wang, Yu-Gang Jiang, Jitao Sang AdversariaLeak: External Information Leakage Attack Using Adversarial Samples on Face Recognition Systems
Roye Katzav, Amit Giloni, Edita Grolman, Hiroo Saito, Tomoyuki Shibata, Tsukasa Omino, Misaki Komatsu, Yoshikazu Hanatani, Yuval Elovici, Asaf Shabtai AEDNet: Adaptive Embedding and Multiview-Aware Disentanglement for Point Cloud Completion
Zhiheng Fu, Longguang Wang, Lian Xu, Zhiyong Wang, Hamid Laga, Yulan Guo, Farid Boussaid, Mohammed Bennamoun AFF-Ttention! Affordances and Attention Models for Short-Term Object Interaction Anticipation
Lorenzo Mur-Labadia, Ruben Martinez-Cantin, Jose J Guerrero, Giovanni Maria Farinella, Antonino Furnari Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
Kilichbek Haydarov, Xiaoqian Shen, Avinash Madasu, Mahmoud Salem, Li-Jia Li, Gamaleldin F Elsayed, Mohamed Elhoseiny Affine Steerers for Structured Keypoint Description
Georg Bökman, Johan Edstedt, Michael Felsberg, Fredrik Kahl AFreeCA: Annotation-Free Counting for All
Adriano D'Alessandro, Ali Mahdavi-Amiri, Ghassan Hamarneh Agent Attention: On the Integration of SoftMax and Linear Attention
Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Siyuan Pan, Pengfei Wan, Shiji Song, Gao Huang Agent3D-Zero: An Agent for Zero-Shot 3D Understanding
Sha Zhang, Di Huang, Jiajun Deng, Shixiang Tang, Wanli Ouyang, Tong He, Yanyong Zhang Agglomerative Token Clustering
Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor, Thomas B. Moeslund AlignZeg: Mitigating Objective Misalignment for Zero-Shot Semantic Segmentation
Jiannan Ge, Lingxi Xie, Hongtao Xie, Pandeng Li, Xiaopeng Zhang, Yongdong Zhang, Qi Tian Alternate Diverse Teaching for Semi-Supervised Medical Image Segmentation
Zhen Zhao, Zicheng Wang, Dian Yu, Longyue Wang, Yixuan Yuan, Luping Zhou AMD: Automatic Multi-Step Distillation of Large-Scale Vision Models
Cheng Han, Qifan Wang, Sohail A Dianat, Majid Rabbani, Raghuveer Rao, Yi Fang, Qiang Guan, Lifu Huang, Dongfang Liu AMEGO: Active Memory from Long EGOcentric Videos
Gabriele Goletto, Tushar Nagarajan, Giuseppe Averta, Dima Damen An Accurate Detection Is Not All You Need to Combat Label Noise in Web-Noisy Datasets
Paul Albert, Kevin McGuinness, Eric Arazo, Tarun Krishna, Noel O Connor, Jack Valmadre An Economic Framework for 6-DoF Grasp Detection
Xiao-Ming Wu, Jia-Feng Cai, Jian-Jian Jiang, Dian Zheng, Yi-Lin Wei, Wei-Shi Zheng An Incremental Unified Framework for Small Defect Inspection
Jiaqi Tang, Hao Lu, Xiaogang Xu, Ruizheng Wu, Sixing Hu, Tong Zhang, Tsz Wa Cheng, Ming Ge, Ying-Cong Chen, Fugee Tsung An Information Theoretical View for Out-of-Distribution Detection
Hu Jinjing, Wenrui Liu, Hong Chang, Bingpeng Ma, Shiguang Shan, Xilin Chen An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes
Zhengyi Zhao, Chen Song, Xiaodong Gu, Yuan Dong, Qi Zuo, Weihao Yuan, Zilong Dong, Liefeng Bo, Qixing Huang AnimatableDreamer: Text-Guided Non-Rigid 3D Model Generation and Reconstruction with Canonical Score Distillation
Xinzhou Wang, Yikai Wang, Junliang Ye, Fuchun Sun, Zhengyi Wang, Ling Wang, Pengkun Liu, Kai Sun, Xintong Wang, Xie Wende, Fangfu Liu, Bin He AnimateMe: 4D Facial Expressions via Diffusion Models
Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros Potamias, Alexandros Lattas, Stylianos Moschoglou, Stylianos Ploumpis, Stefanos Zafeiriou Any2Point: Empowering Any-Modality Transformers for Efficient 3D Understanding
Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Bin Zhao, Zhigang Wang, Dong Wang, Peng Gao, Hongsheng Li, Xuelong Li Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors
Wei Shang, Dongwei Ren, Wanying Zhang, Yuming Fang, Wangmeng Zuo, Kede Ma Arc2Face: A Foundation Model for ID-Consistent Human Faces
Foivos Paraperas Papantoniou, Alexandros Lattas, Stylianos Moschoglou, Jiankang Deng, Bernhard Kainz, Stefanos Zafeiriou Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?
Rosario Leonardi, Antonino Furnari, Francesco Ragusa, Giovanni Maria Farinella ARoFace: Alignment Robustness to Improve Low-Quality Face Recognition
Mohammad Saeed Ebrahimi Saadabadi, Sahar Rahimi Malakshan, Ali Dabouei, Nasser Nasrabadi ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Yicheng Zhu, Keren Ye, Junjie Ke, Jiahui Yu, Leonidas Guibas, Peyman Milanfar, Feng Yang Asymmetric Mask Scheme for Self-Supervised Real Image Denoising
Xiangyu Liao, Tianheng Zheng, Jiayu Zhong, Pingping Zhang, Chao Ren Asynchronous Bioplausible Neuron for Spiking Neural Networks for Event-Based Vision
Hussain Sajwani, Dimitrios Makris, Yahya Prof. Zweiri, Fariborz Baghaei Naeini, Sanket Mr Kachole Attention Beats Linear for Fast Implicit Neural Representation Generation
Shuyi Zhang, Ke Liu, Jingjun Gu, Xiaoxu Cai, Zhihua Wang, Jiajun Bu, Haishuai Wang AttnZero: Efficient Attention Discovery for Vision Transformers
Lujun Li, Zimian Wei, Peijie Dong, Wenhan Luo, Wei Xue, Qifeng Liu, Yike Guo Audio-Driven Talking Face Generation with Stabilized Synchronization Loss
Dogucan Yaman, Fevziye Irem Eyiokur, Leonard Bärmann, Hazim Kemal Ekenel, Alexander Waibel Audio-Synchronized Visual Animation
Lin Zhang, Shentong Mo, Yijing Zhang, Pedro Morgado Augmented Neural Fine-Tuning for Efficient Backdoor Purification
Nazmul Karim, Abdullah Al Arafat, Umar Khalid, Zhishan Guo, Nazanin Rahnavard AugUndo: Scaling up Augmentations for Monocular Depth Completion and Estimation
Yangchao Wu, Tian Yu Liu, Hyoungseob Park, Stefano Soatto, Dong Lao, Alex Wong Auto-GAS: Automated Proxy Discovery for Training-Free Generative Architecture Search
Lujun Li, Haosen Sun, Shiwen Li, Peijie Dong, Wenhan Luo, Wei Xue, Qifeng Liu, Yike Guo Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos
Ekta Prashnani, Koki Nagano, Shalini De Mello, David P Luebke, Orazio Gallo Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans, Shreya Pathak, Hamza Merzic, Jonathan Richard Schwarz, Ryutaro Tanno, Olivier Henaff BAFFLE: A Baseline of Backpropagation-Free Federated Learning
Haozhe Feng, Tianyu Pang, Chao Du, Wei Chen, Shuicheng Yan, Min Lin BAGS: Blur Agnostic Gaussian Splatting Through Multi-Scale Kernel Modeling
Cheng Peng, Yutao Tang, Yifan Zhou, Nengyu Wang, Xijun Liu, Deming Li, Rama Chellappa BAMM: Bidirectional Autoregressive Motion Model
Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Pu Wang, Minwoo Lee, Srijan Das, Chen Chen Be-Your-Outpainter: Mastering Video Outpainting Through Input-Specific Adaptation
Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation
Zikai Huang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Chenxi Zheng, Jing Qin, Shengfeng He BenchLMM: Benchmarking Cross-Style Visual Capability of Large Multimodal Models
Rizhao Cai, Zirui Song, Dayan Guan, Zhenhao Chen, Yaohang Li, Xing Luo, Chenyu Yi, Alex Kot Benchmarking Object Detectors with COCO: A New Path Forward
Shweta Singh, Aayan Yadav, Jitesh Jain, Humphrey Shi, Justin Johnson, Karan Desai Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Liu Zheng, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung Jin Chang, Angela Yao Beta-Tuned Timestep Diffusion Model
Tianyi Zheng, Peng-Tao Jiang, Ben Wan, Hao Zhang, Jinwei Chen, Jia Wang, Bo Li Better Call SAL: Towards Learning to Segment Anything in LiDAR
Aljosa Osep, Tim Meinhardt, Francesco Ferroni, Neehar Peri, Deva Ramanan, Laura Leal-Taixé Better Regression Makes Better Test-Time Adaptive 3D Object Detection
Jiakang Yuan, Bo Zhang, Kaixiong Gong, Xiangyu Yue, Botian Shi, Yu Qiao, Tao Chen Beyond MOT: Semantic Multi-Object Tracking
Yunhao Li, Qin Li, Hao Wang, Xue Ma, Jiali Yao, Shaohua Dong, Heng Fan, Libo Zhang Bi-Directional Contextual Attention for 3D Dense Captioning
Minjung Kim, Hyung Suk Lim, Soonyoung Lee, Bumsoo Kim, Gunhee Kim BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation
Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Kang Zhang, Yu-Jung Heo, Du-Seong Chang, Chang D. Yoo Bidirectional Uncertainty-Based Active Learning for Open-Set Annotation
Chen-Chen Zong, Ye-Wen Wang, Kun-Peng Ning, Hai-Bo Ye, Sheng-Jun Huang BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering
Xinmin Qiu, Congying Han, Zicheng Zhang, Bonan Li, Tiande Guo, Pingyu Wang, Xuecheng Nie Blind Image Deblurring with Noise-Robust Kernel Estimation
Chanseok Lee, Jeongsol Kim, Seungmin Lee, Jaehwang Jung, Yunje Cho, Taejoong Kim, Taeyong Jo, Myungjun Lee, Mooseok Jang BLINK: Multimodal Large Language Models Can See but Not Perceive
Xingyu Fu, Yushi Hu, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A Smith, Wei-Chiu Ma, Ranjay Krishna BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation Using RGB Frames and Events
Yijin Li, Yichen Shen, Zhaoyang Huang, Shuo Chen, Weikang Bian, Xiaoyu Shi, Fu-Yun Wang, Keqiang Sun, Hujun Bao, Zhaopeng Cui, Guofeng Zhang, Hongsheng Li Brain Netflix: Scaling Data to Reconstruct Videos from Brain Signals
Camilo L Fosco, Benjamin Lahner, Bowen Pan, Alex Andonian, Emilie L Josephs, Alex Lascelles, Aude Oliva BRAVE: Broadening the Visual Encoding of Vision-Language Models
Oğuzhan Fatih Kar, Alessio Tonioni, Petra Poklukar, Achin Kulshrestha, Amir Zamir, Federico Tombari Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture
ShahRukh Athar, Shunsuke Saito, Stanislav Pidhorskyi, Zhengyu Yang, Chen Cao Bucketed Ranking-Based Losses for Efficient Training of Object Detectors
Feyza Yavuz, Baris Can Cam, Adnan Harun Dogan, Kemal Oksuz, Emre Akbas, Sinan Kalkan BugNIST - A Large Volumetric Dataset for Detection Under Domain Shift
Patrick M Jensen, Vedrana A Dahl, Rebecca Engberg, Carsten Gundlach, Hans Martin Kjer, Anders B Dahl ByteEdit: Boost, Comply and Accelerate Generative Image Editing
Yuxi Ren, Jie Wu, Yanzuo Lu, Huafeng Kuang, Xin Xia, Xionghui Wang, Qianqian Wang, Yixing Zhu, Pan Xie, Shiyin Wang, Xuefeng Xiao, Yitong Wang, Min Zheng, Lean Fu C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li, Zhenhua Feng, Tianyang Xu, Linze Li, Xiao-Jun Wu, Muhammad Awais, Sara Atito, Josef Kittler CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches
Sifan Wu, Amir Hosein Khasahmadi, Mor Katz, Pradeep Kumar Jayaraman, Yewen Pu, Karl D.D. Willis, Bang Liu Caltech Aerial RGB-Thermal Dataset in the Wild
Connor Lee, Matthew Anderson, Nikhil Ranganathan, Xingxing Zuo, Kevin T Do, Georgia Gkioxari, Soon-Jo Chung Camera Calibration Using a Collimator System
Shunkun Liang, Banglei Guan, Zhenbao Yu, Pengju Sun, Yang Shang Camera-LiDAR Cross-Modality Gait Recognition
Wenxuan Guo, Yingping Liang, Zhiyu Pan, Ziheng Xi, Jianjiang Feng, Jie Zhou CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection
Xunfa Lai, Zhiyu Yang, Jie Hu, ShengChuan Zhang, Liujuan Cao, Guannan Jiang, Songan Zhang, Zhiyu Wang, Rongrong Ji Can OOD Object Detectors Learn from Foundation Models?
Jiahui Liu, Xin Wen, Shizhen Zhao, Yingxian Chen, Xiaojuan Qi Canonical Shape Projection Is All You Need for 3D Few-Shot Class Incremental Learning
Ali Cheraghian, Zeeshan Hayder, Sameeea Ramasinghe, Shafin Rahman, Javad Jafaryahya, Lars Petersson, Mehrtash Harandi CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images
Jisu Shin, Junmyeong Lee, Seongmin Lee, Min-Gyu Park, Jumi Kang, Ju Hong Yoon, Hae-Gon Jeon CARB-Net: Camera-Assisted Radar-Based Network for Vulnerable Road User Detection
Wei-Yu Lee, Martin Dimitrievski, David Van Hamme, Jan Aelterman, Ljubomir Jovanov, Wilfried Philips CARFF: Conditional Auto-Encoded Radiance Field for 3D Scene Forecasting
Jiezhi Yang, Khushi P Desai, Charles Packer, Harshil Bhatia, Nicholas Rhinehart, Rowan McAllister, Joseph E Gonzalez Cascade Prompt Learning for Visual-Language Model Adaptation
Ge Wu, Xin Zhang, Zheng Li, Zhaowei Chen, Jiajun Liang, Jian Yang, Xiang Li Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views
Yabo Chen, Jiemin Fang, Yuyang Huang, Taoran Yi, Xiaopeng Zhang, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model
Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Ling Shao, Shijian Lu CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing
Haibo Jin, Ruoxi Chen, Jinyin Chen, Haibin Zheng, Yang Zhang, Haohan Wang Certifiably Robust Image Watermark
Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Jinyuan Jia, Neil Zhenqiang Gong CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-Aware 3D Gaussian Field
Jiarui Hu, Xianhao Chen, Boyin Feng, Guanglin Li, Liangjing Yang, Hujun Bao, Guofeng Zhang, Zhaopeng Cui Chains of Diffusion Models
Yanheng Wei, Lianghua Huang, Zhi-Fan Wu, Wei Wang, Yu Liu, Mingda Jia, Shuailei Ma Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
Shenhao Zhu, Junming Leo Chen, Zuozhuo Dai, Zilong Dong, Yinghui Xu, Xun Cao, Yao Yao, Hao Zhu, Siyu Zhu Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts
Shuangkang Fang, Yufeng Wang, Yi-Hsuan Tsai, Yi Yang, Wenrui Ding, Shuchang Zhou, Ming-Hsuan Yang CIC-BART-SSA: : Controllable Image Captioning with Structured Semantic Augmentation
Kalliopi Basioti, Mohamed A Abdelsalam, Federico Fancellu, Vladimir Pavlovic, Afsaneh Fazly CipherDM: Secure Three-Party Inference for Diffusion Model Sampling
Xin Zhao, Xiaojun Chen, Xudong Chen, He Li, Tingyu Fan, Zhendong Zhao Clean & Compact: Efficient Data-Free Backdoor Defense with Model Compactness
Huy Phan, Jinqi Xiao, Yang Sui, Tianfang Zhang, Zijie Tang, Cong Shi, Yan Wang, Yingying Chen, Bo Yuan ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Mengcheng Lan, Chaofeng Chen, Yiping Ke, Xinjiang Wang, Litong Feng, Wayne Zhang CLEO: Continual Learning of Evolving Ontologies
Shishir Muralidhara, Saqib Bukhari, Georg Dr. Schneider, Didier Stricker, René Schuster Click-Gaussian: Interactive Segmentation to Any 3D Gaussians
Seokhun Choi, Hyeonseop Song, Jaechul Kim, Taehyeong Kim, Hoseok Do CliffPhys: Camera-Based Respiratory Measurement Using Clifford Neural Networks
Omar Ghezzi, Giuseppe Boccignone, Giuliano Grossi, Raffaella Lanzarotti, Alessandro D'Amelio CLIP-DINOiser: Teaching CLIP a Few DINO Tricks for Open-Vocabulary Semantic Segmentation
Monika Wysoczańska, Oriane Siméoni, Michaël Ramamonjisoa, Andrei Bursuc, Tomasz Trzciński, Patrick Pérez CMD: A Cross Mechanism Domain Adaptation Dataset for 3D Object Detection
Jinhao Deng, Wei Ye, Hai Wu, Qiming Xia, Xun Huang, Xin Li, Jin Fang, Wei Li, Chenglu Wen, Cheng Wang Co-Speech Gesture Video Generation with 3D Human Meshes
Aniruddha Mahapatra, Richa Mishra, Ziyi Chen, Boyang Ding, Renda Li, Shoulei Wang, Jun-Yan Zhu, Peng Chang, Mei Han, Jing Xiao Cocktail Universal Adversarial Attack on Deep Neural Networks
Shaoxin Li, Xiaofeng Liao, Xin Che, Xintong Li, Yong Zhang, Lingyang Chu CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning
ZiYang Gong, FuHao Li, Yupeng Deng, Deblina Bhattacharjee, Xianzheng Ma, Xiangwei Zhu, Zhenming Ji CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Wendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, Jie Tang CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians
Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, Nima Khademi Kalantari COIN-Matting: Confounder Intervention for Image Matting
Zhaohe Liao, Jiangtong Li, Jun Lan, Huijia Zhu, Weiqiang Wang, Li Niu, Liqing Zhang COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation
Jiefeng Li, Ye Yuan, Davis Rempe, Haotian Zhang, Pavlo Molchanov, Cewu Lu, Jan Kautz, Umar Iqbal Collaborative Control for Geometry-Conditioned PBR Image Generation
Shimon Vainer, Mark Boss, Mathias Parger, Konstantin Kutsy, Dante De Nigris, Ciara Rowles, Nicolas Perony, Simon Donné COM Kitchens: An Unedited Overhead-View Procedural Videos Dataset a Vision-Language Benchmark
Atsushi Hashimoto, Koki Maeda, Tosho Hirasawa, Jun Harashima, Leszek Rybicki, Yusuke Fukasawa, Yoshitaka Ushiku ComFusion: Enhancing Personalized Generation by Instance-Scene Compositing and Fusion
Yan Hong, Yuxuan Duan, Bo Zhang, Haoxing Chen, Jun Lan, Huijia Zhu, Weiqiang Wang, Jianfu Zhang Common Sense Reasoning for Deep Fake Detection
Yue Zhang, Ben Colman, Xiao Guo, Ali Shahriyari, Gaurav Bharaj Commonly Interesting Images
Fitim Abdullahu, Helmut Grabner CoMo: Controllable Motion Generation Through Language Guided Pose Code Editing
Yiming Huang, Weilin Wan, Yue Yang, Chris Callison-Burch, Mark Yatskar, Lingjie Liu Compact 3D Scene Representation via Self-Organizing Gaussian Grids
Wieland Morgenstern, Florian Barthel, Anna Hilsmann, Peter Eisert CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization
K L Navaneet, Kossar Pourahmadi Meibodi, Soroush Abbasi Koohpayegani, Hamed Pirsiavash COMPOSE: Comprehensive Portrait Shadow Editing
Andrew Z Hou, Zhixin Shu, Xuaner Zhang, He Zhang, Yannick Hold-Geoffroy, Jae Shin Yoon, Xiaoming Liu Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, David Bau CONDA: Condensed Deep Association Learning for Co-Salient Object Detection.
Long Li, Nian Liu, Dingwen Zhang, Zhongyu Li, Salman Khan, Rao Anwer, Hisham Cholakkal, Junwei Han, Fahad Shahbaz Khan ConDense: Consistent 2D-3D Pre-Training for Dense and Sparse Features from Multi-View Images
Xiaoshuai Zhang, Zhicheng Wang, Howard Zhou, Soham Ghosh, Danushen L Gnanapragasam, Varun Jampani, Hao Su, Leonidas Guibas ConGeo: Robust Cross-View Geo-Localization Across Ground View Variations
Li Mi, Chang Xu, Javiera Castillo Navarro, Syrielle Montariol, Wen Yang, Antoine Bosselut, Devis Tuia Consistent 3D Line Mapping
Xulong Bai, Hainan Cui, Shuhan Shen Context Diffusion: In-Context Aware Image Generation
Ivona Najdenkoska, Animesh Sinha, Abhimanyu Dubey, Dhruv Mahajan, Vignesh Ramanathan, Filip Radenovic Continual Learning and Unknown Object Discovery in 3D Scenes via Self-Distillation
Mohamed El Amine Boudjoghra, Jean Lahoud, Salman Khan, Hisham Cholakkal, Rao M Anwer, Fahad Shahbaz Khan Continuous Memory Representation for Anomaly Detection
Joo Chan Lee, Taejune Kim, Eunbyung Park, Simon S Woo, Jong Hwan Ko Continuous SO(3) Equivariant Convolution for 3D Point Cloud Analysis
Jaein Kim, Hee Bin Yoo, Dong-Sig Han, Yeon-Ji Song, Byoung-Tak Zhang Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
Lorenzo Baraldi, Federico Cocchi, Marcella Cornia, Lorenzo Baraldi, Alessandro Nicolosi, Rita Cucchiara Contrastive Learning with Synthetic Positives
Dewen Zeng, Xinrong Hu, Yawen Wu, Xiaowei Xu, Yiyu Shi ControlCap: Controllable Region-Level Captioning
Yuzhong Zhao, Liu Yue, Zonghao Guo, Weijia Wu, Chen Gong, Qixiang Ye, Fang Wan Controllable Human-Object Interaction Synthesis
Jiaman Li, Alexander Clegg, Roozbeh Mottaghi, Jiajun Wu, Xavier Puig, C. Karen Liu Controllable Navigation Instruction Generation with Chain of Thought Prompting
Xianghao Kong, Jinyu Chen, Wenguan Wang, Hang Su, Xiaolin Hu, Yi Yang, Si Liu Controlling the World by Sleight of Hand
Sruthi Sudhakar, Ruoshi Liu, Basile Van Hoorick, Carl Vondrick, Richard Zemel ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Ziheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization
Jiawei Zhang, Jiahe Li, Xiaohan Yu, Lei Huang, Lin Gu, Jin Zheng, Xiao Bai CoReS: Orchestrating the Dance of Reasoning and Segmentation
Xiaoyi Bao, Siyang Sun, Shuailei Ma, Kecheng Zheng, Yuxin Guo, Guosheng Zhao, Yun Zheng, Xingang Wang Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning
Ray Zhang, Zheming Zhou, Min Sun, Omid Ghasemalizadeh, Cheng-Hao Kuo, Ryan M. Eustice, Maani Ghaffari Jadidi, Arnie Sen CoTracker: It Is Better to Track Together
Nikita Karaev, Ignacio Rocco, Ben Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht CountFormer: Multi-View Crowd Counting Transformer
Hong Mo, Xiong Zhang, Jianchao Tan, Cheng Yang, Qiong Gu, Bo Hang, Wenqi Ren CPT-VR: Improving Surface Rendering via Closest Point Transform with View-Reflection Appearance
Zhipeng Hu, Yongqiang Zhang, Chen Liu, Lincheng Li, Sida Peng, Xiaowei Zhou, Changjie Fan, Xin Yu CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model
Zhengyi Wang, Yikai Wang, Yifei Chen, Chendong Xiang, Shuo Chen, Dajiang Yu, Chongxuan Li, Hang Su, Jun Zhu CroMo-Mixup: Augmenting Cross-Model Representations for Continual Self-Supervised Learning
Erum Mushtaq, Duygu Nur Yaldiz, Yavuz Faruk Bakman, Jie Ding, Chenyang Tao, Dimitrios Dimitriadis, Salman Avestimehr Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector
Yuqian Fu, Yu Wang, Yixuan Pan, Xingyu Qiu, Lian Huai, Zeyu Shangguan, Tong Liu, Yanwei Fu, Luc Van Gool, Xingqun Jiang Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach
Shizhou Zhang, Wenlong Luo, De Cheng, Qingchun Yang, Lingyan Ran, Yinghui Xing, Yanning Zhang Cross-View Image Geo-Localization with Panorama-BEV Co-Retrieval Network
Junyan Ye, Zhutao Lv, Weijia Li, Jinhua Yu, Haote Yang, Huaping Zhong, Conghui He CrossGLG: LLM Guides One-Shot Skeleton-Based 3D Action Recognition in a Cross-Level Manner
Tingbing Yan, Wenzheng Zeng, Yang Xiao, Xingyu Tong, Bo Tan, Zhiwen Fang, Zhiguo Cao, Joey Tianyi Zhou CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models
Nick Stracke, Stefan Andreas Baumann, Joshua Susskind, Miguel Angel Bautista, Bjorn Ommer Curved Diffusion: A Generative Model with Optical Geometry Control
Andrey Voynov, Amir Hertz, Moab Arar, Shlomi Fruchter, Daniel Cohen-Or Customize-a-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
Yixuan Ren, Yang Zhou, Jimei Yang, Jing Shi, Difan Liu, Feng Liu, Mingi Kwon, Abhinav Shrivastava Cut Out the Middleman: Revisiting Pose-Based Gait Recognition
Yang Fu, Saihui Hou, Shibei Meng, Xuecai Hu, Chunshui Cao, Xu Liu, Yongzhen Huang D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction
Bowen Fu, Gu Wang, Chenyangguang Zhang, Yan Di, Ziqin Huang, Zhiying Leng, Fabian Manhardt, Xiangyang Ji, Federico Tombari D4-VTON: Dynamic Semantics Disentangling for Differential Diffusion Based Virtual Try-on
Zhaotong Yang, Zicheng Jiang, Xinzhe Li, Huiyu Zhou, Junyu Dong, Huaidong Zhang, Yong Du DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception
Kai Jiang, Jiaxing Huang, Weiying Xie, Jie Lei, Yunsong Li, Ling Shao, Shijian Lu DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition
Qi Wang, Zhou Xu, Yuming Lin, Jingtao Ye, Hongsheng Li, Guangming Zhu, Syed Afaq Ali Shah, Mohammed Bennamoun, Liang Zhang Data Augmentation via Latent Diffusion for Saliency Prediction
Bahar Aydemir, Deblina Bhattacharjee, Tong Zhang, Mathieu Salzmann, Sabine Süsstrunk Data Collection-Free Masked Video Modeling
Yuchi Ishikawa, Masayoshi Kondo, Yoshimitsu Aoki Data Poisoning Quantization Backdoor Attack
Tran Huynh, Anh Tran, Khoa Doan, Tung Pham Data-to-Model Distillation: Data-Efficient Learning Framework
Ahmad Sajedi, Samir Khaki, Lucy Z. Liu, Ehsan Amjadian, Yuri A. Lawryshyn, Konstantinos N. Plataniotis DataDream: Few-Shot Guided Dataset Generation
Jae Myung Kim, Jessica Bader, Stephan Alaniz, Cordelia Schmid, Zeynep Akata Dataset Distillation by Automatic Training Trajectories
Dai Liu, Jindong Gu, Hu Cao, Carsten Trinitis, Martin Schulz Dataset Growth
Ziheng Qin, Zhaopan Xu, YuKun Zhou, Kai Wang, Zangwei Zheng, Zebang Cheng, Hao Tang, Lei Shang, Baigui Sun, Radu Timofte, Xiaojiang Peng, Hongxun Yao, Yang You DATENeRF: Depth-Aware Text-Based Editing of NeRFs
Sara Rojas Martinez, Julien Philip, Kai Zhang, Sai Bi, Fujun Luan, Bernard Ghanem, Kalyan Sunkavalli De-Confounded Gaze Estimation
Ziyang Liang, Yiwei Bao, Feng Lu De-Confusing Pseudo-Labels in Source-Free Domain Adaptation
Idit Diamant, Amir Rosenfeld, Idan Achituve, Jacob Goldberger, Arnon Netzer Debiasing Surgeon: Fantastic Weights and How to Find Them
Remi Nahon, Ivan Luiz De Moura Matos, Van-Tam Nguyen, Enzo Tartaglione Deblurring 3D Gaussian Splatting
Byeonghyeon Lee, Howoong Lee, Xiangyu Sun, Usman Ali, Eunbyung Park DecentNeRFs: Decentralized Neural Radiance Fields from Crowdsourced Images
Zaid Tasneem, Akshat Dave, Abhishek Singh, Kushagra Tiwary, Praneeth Vepakomma, Ashok Veeraraghavan, Ramesh Raskar DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation
Rakshith Subramanyam, Kowshik Thopalli, Vivek Sivaraman Narayanaswamy, Jayaraman J. Thiagarajan DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement
Qimin Chen, Zhiqin Chen, Vladimir G. Kim, Noam Aigerman, Hao Zhang, Siddhartha Chaudhuri Decoupling Common and Unique Representations for Multimodal Self-Supervised Learning
Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham, Chenying Liu, Zhitong Xiong, Xiao Xiang Zhu Deep Cost Ray Fusion for Sparse Depth Video Completion
Jungeon Kim, Soongjin Kim, Jaesik Park, Seungyong Lee Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks
Cheng Gong, Yao Chen, Qiuyang Luo, Ye Lu, Tao Li, Yuzhi Zhang, Yufei Sun, Le Zhang Deep Patch Visual SLAM
Lahav Lipson, Zachary Teed, Jia Deng Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Xiaoshi Wu, Yiming Hao, Manyuan Zhang, Keqiang Sun, Zhaoyang Huang, Guanglu Song, Yu Liu, Hongsheng Li Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics
Shuai Yang, ZhiFei Chen, Pengguang Chen, Xi Fang, Yixun Liang, Shu Liu, Yingcong Chen Delving Deep into Engagement Prediction of Short Videos
Dasong Li, Wenjie Li, Baili Lu, Hongsheng Li, Sizhuo Ma, Gurunandan Krishnan, Jian Wang Delving into Adversarial Robustness on Document Tampering Localization
Huiru Shao, Zhuang Qian, Kaizhu Huang, Wei Wang, Xiaowei Huang, Qiufeng Wang Denoising Vision Transformers
Jiawei Yang, Katie Z Luo, Jiefeng Li, Congyue Deng, Leonidas Guibas, Dilip Krishnan, Kilian Weinberger, Yonglong Tian, Yue Wang Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics
Woojin Cho, Jihyun Lee, Minjae Yi, Minje Kim, Taeyun Woo, Donghwan Kim, Taewook Ha, Hyokeun Lee, Je-Hwan Ryu, Woontack Woo, Tae-Kyun Kim Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Ruihuang Li, Zhengqiang Zhang, Chenhang He, Zhiyuan Ma, Vishal Patel, Lei Zhang DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks
Sarah Jabbour, Gregory Kondas, Ella Kazerooni, Michael Sjoding, David Fouhey, Jenna Wiens Depth-Guided NeRF Training via Earth Mover’s Distance
Anita Rau, Josiah Aklilu, Floyd C Holsinger, Serena Yeung-Levy DetailSemNet: Elevating Signature Verification Through Detail-Semantic Integration
Meng-Cheng Shih, Tsai-Ling Huang, Yu-Heng Shih, Hong-Han Shuai, Hsuan-Tung Liu, Yi-Ren Yeh, Ching-Chun Huang DeTra: A Unified Model for Object Detection and Trajectory Forecasting
Sergio Casas, Ben T Agro, Jiageng Mao, Thomas Gilles, Alexander Y Cui, Enxu Li, Raquel Urtasun DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Philip Torr, Jian Wu DG-PIC: Domain Generalized Point-in-Context Learning for Point Cloud Understanding
Jincen Jiang, Qianyu Zhou, Yuhang Li, Xuequan Lu, Meili Wang, Lizhuang Ma, Jian Chang, Jian Jun Zhang DGD: Dynamic 3D Gaussians Distillation
Isaac Labe, Noam Issachar, Itai Lang, Sagie Benaim DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior
Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Yu Qiao, Wanli Ouyang, Chao Dong DiffClass: Diffusion-Based Class Incremental Learning
Zichong Meng, Jie Zhang, Changdi Yang, Zheng Zhan, Pu Zhao, Yanzhi Wang DIFFender: Diffusion-Based Adversarial Defense Against Patch Attacks
Caixin Kang, Yinpeng Dong, Zhengyi Wang, Shouwei Ruan, Yubo Chen, Hang Su, Xingxing Wei Differentiable Convex Polyhedra Optimization from Multi-View Images
Daxuan Ren, Haiyi Mei, Hezi Shi, Jianmin Zheng, Jianfei Cai, Lei Yang DiffFAS: Face Anti-Spoofing via Generative Diffusion Models
Xinxu Ge, Xin Liu, Zitong Yu, Jingang Shi, Chun Qi, Jie Li, Heikki Kälviäinen DiffiT: Diffusion Vision Transformers for Image Generation
Ali Hatamizadeh, Jiaming Song, Guilin Liu, Jan Kautz, Arash Vahdat DiffuMatting: Synthesizing Arbitrary Objects with Matting-Level Annotation
Xiaobin Hu, Xu Peng, Donghao Luo, Xiaozhong Ji, Jinlong Peng, ZhengKai Jiang, Jiangning Zhang, Taisong Jin, Chengjie Wang, Rongrong Ji Diffusion Bridges for 3D Point Cloud Denoising
Mathias Vogel Hüni, Keisuke Tateno, Marc Pollefeys, Federico Tombari, Marie-Julie Rakotosaona, Francis Engelmann Diffusion for Natural Image Matting
Yihan Hu, Yiheng Lin, Wei Wang, Yao Zhao, Yunchao Wei, Humphrey Shi Diffusion Models as Data Mining Tools
Ioannis Siglidis, Aleksander Holynski, Alexei A. Efros, Mathieu Aubry, Shiry Ginosar Diffusion Models as Optimizers for Efficient Planning in Offline RL
Renming Huang, Yunqiang Pei, Guoqing Wang, Yangming Zhang, Yang Yang, Peng Wang, Heng Tao Shen Diffusion Models for Open-Vocabulary Segmentation
Laurynas Karazija, Iro Laina, Andrea Vedaldi, Christian Rupprecht Diffusion Soup: Model Merging for Text-to-Image Diffusion Models
Benjamin J Biggs, Arjun Seshadri, Yang Zou, Achin Jain, Aditya Golatkar, Yusheng Xie, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning
Jinglin Liang, Jin Zhong, Hanlin Gu, Zhongqi Lu, Xingxing Tang, Gang Dai, Shuangping Huang, Lixin Fan, Qiang Yang Diffusion-Guided Weakly Supervised Semantic Segmentation
Sung-Hoon Yoon, Hoyong Kwon, Jaeseok Jeong, Daehee Park, Kuk-Jin Yoon Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following
Qiaomu Miao, Alexandros Graikos, Jingwei Zhang, Sounak Mondal, Minh Hoai, Dimitris Samaras DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays
Xuhui Liu, Zhi Qiao, Runkun Liu, Hong Li, Xiantong Zhen, Zhen Qian, Juan Zhang, Baochang Zhang Direct Distillation Between Different Domains
Jialiang Tang, Shuo Chen, Gang Niu, Hongyuan Zhu, Joey Tianyi Zhou, Chen Gong, Masashi Sugiyama Disentangled Clothed Avatar Generation from Text Descriptions
Jionghao Wang, Yuan Liu, Zhiyang Dou, Zhengming Yu, Yongqing Liang, Cheng Lin, Rong Xie, Li Song, Xin Li, Wenping Wang Disentangled Generation and Aggregation for Robust Radiance Fields
Shihe Shen, Huachen Gao, Wangze Xu, Rui Peng, Luyang Tang, Kaiqiang Xiong, Jianbo Jiao, Ronggang Wang Distilling Diffusion Models into Conditional GANs
MinGuk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, Taesung Park Distilling Knowledge from Large-Scale Image Models for Object Detection
Gang Li, Wenhai Wang, Xiang Li, Ziheng Li, Jian Yang, Jifeng Dai, Yu Qiao, Shanshan Zhang Distractor-Free Novel View Synthesis via Exploiting Memorization Effect in Optimization
Yukun Wang, Kunhong Li, Minglin Chen, Longguang Wang, Shunbo Zhou, Kaiwen Xue, Yulan Guo Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams
Ziqiang Wang, Zhixiang Chi, Yanan Wu, Li Gu, Zhi Liu, Konstantinos N Plataniotis, Yang Wang Diverse Text-to-3D Synthesis with Augmented Text Embedding
Uy Dieu Tran, Minh N. Hoang Luu, Phong Ha Nguyen, Khoi Nguyen, Binh-Son Hua Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images
Tianyu Luan, Zhongpai Gao, Luyuan Xie, Abhishek Sharma, Hao Ding, Benjamin Planche, Meng Zheng, Ange Lou, Terrence Chen, Junsong Yuan, Ziyan Wu DMiT: Deformable Mipmapped Tri-Plane Representation for Dynamic Scenes
Jing-Wen Yang, Jia-Mu Sun, Yong-Liang Yang, Jie Yang, Ying Shan, Yan-Pei Cao, Lin Gao Do Generalised Classifiers Really Work on Human Drawn Sketches?
Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Aneeshan Sain, Subhadeep Koley, Tao Xiang, Ayan Kumar Bhunia, Yi-Zhe Song Do Text-Free Diffusion Models Learn Discriminative Visual Representations?
Soumik Mukhopadhyay, Matthew A Gwilliam, Yosuke Yamaguchi, Vatsal Agarwal, Namitha Padmanabhan, Archana Swaminathan, Tianyi Zhou, Jun Ohya, Abhinav Shrivastava DOCCI: Descriptions of Connected and Contrasting Images
Yasumasa Onoe, Sunayana Rane, Zachary E Berger, Yonatan Bitton, Jaemin Cho, Roopal Garg, Alexander Ku, Zarana Parekh, Jordi Pont-Tuset, Garrett Tanzer, Su Wang, Jason M Baldridge Dolfin: Diffusion Layout Transformers Without Autoencoder
Yilin Wang, Zeyuan Chen, Liangjun Zhong, Zheng Ding, Zhuowen Tu Dolphins: Multimodal Language Model for Driving
Yingzi Ma, Yulong Cao, Jiachen Sun, Marco Pavone, Chaowei Xiao Domain-Adaptive Video Deblurring via Test-Time Blurring
Jin-Ting He, Fu-Jen Tsai, Jia-Hao Wu, Yan-Tsung Peng, Chung-Chi Tsai, Chia-Wen Lin, Yen-Yu Lin DomainFusion: Generalizing to Unseen Domains with Latent Diffusion Models
Yuyang Huang, Yabo Chen, Yuchen Liu, Xiaopeng Zhang, Wenrui Dai, Hongkai Xiong, Qi Tian DoubleTake: Geometry Guided Depth Estimation
Mohamed Sayed, Filippo Aleotti, Jamie Watson, Zawar Qureshi, Guillermo Garcia-Hernando, Gabriel Brostow, Sara Vicente, Michael Firman DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly
Fenggen Yu, Yiming Qian, Xu Zhang, Francisca Gil-Ureta, Brian Jackson, Eric Bennett, Hao Zhang Drag Anything: Motion Control for Anything Using Entity Representation
Weijia Wu, Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Zhang Di DragVideo: Interactive Drag-Style Video Editing
Yufan Deng, Ruida Wang, Yuhao Zhang, Yu-Wing Tai, Chi-Keung Tang DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors
Zizheng Yan, Jiapeng Zhou, Fanpeng Meng, Yushuang Wu, Lingteng Qiu, Zisheng Ye, Shuguang Cui, Guanying Chen, Xiaoguang Han DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation
Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Tao Mei DreamReward: Aligning Human Preference in Text-to-3D Generation
Junliang Ye, Fangfu Liu, Qixiu Li, Zhengyi Wang, Yikai Wang, Xinzhou Wang, Yueqi Duan, Jun Zhu DreamScene: 3D Gaussian-Based Text-to-3D Scene Generation via Formation Pattern Sampling
Haoran Li, Haolin Shi, Wenli Zhang, Wenjun Wu, Yong Liao, Lin Wang, Lik-Hang Lee, Peng Yuan Zhou DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Shijie Zhou, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas K Bharadwaj, Suya You, Zhangyang Wang, Achuta Kadambi DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation
Yi-Hao Peng, Faria Huq, Yue Jiang, Jason Wu, Xin Yue Li, Jeffrey Bigham, Amy Pavel DreamView: Injecting View-Specific Text Guidance into Text-to-3D Generation
Junkai Yan, Yipeng Gao, Qize Yang, Xihan Wei, Xuansong Xie, Ancong Wu, Wei-Shi Zheng DriveLM: Driving with Graph Visual Question Answering
Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Zhang Hanxue, Chengen Xie, Jens Beißwenger, Ping Luo, Andreas Geiger, Hongyang Li DSA: Discriminative Scatter Analysis for Early Smoke Segmentation
Lujian Yao, Haitao Zhao, Jingchao Peng, Zhongze Wang, Kaijie Zhao Dual-Camera Smooth Zoom on Mobile Phones
Renlong Wu, Zhilu Zhang, Yu Yang, Wangmeng Zuo Dual-Rain: Video Rain Removal Using Assertive and Gentle Teachers
Tingting Chen, Beibei Lin, Yeying Jin, Wending Yan, Wei Ye, Yuan Yuan, Robby T. Tan DualDn: Dual-Domain Denoising via Differentiable ISP
Ruikang Li, Yujin Wang, Shiqi Chen, Fan Zhang, Jinwei Gu, Tianfan Xue DyFADet: Dynamic Feature Aggregation for Temporal Action Detection
Le Yang, Ziwei Zheng, Yizeng Han, Hao Cheng, Shiji Song, Gao Huang, Fan Li Dynamic Retraining-Updating Mean Teacher for Source-Free Object Detection
Trinh Le Ba Khanh, Huy-Hung Nguyen, Long Hoang Pham, Duong Nguyen-Ngoc Tran, Jae Wook Jeon DynamiCrafter: Animating Open-Domain Images with Video Diffusion Priors
Jinbo Xing, Menghan Xia, Yong Zhang, Haoxin Chen, Wangbo Yu, Hanyuan Liu, Gongye Liu, Xintao Wang, Ying Shan, Tien-Tsin Wong DεpS: Delayed Ε-Shrinking for Faster Once-for-All Training
Aditya Annavajjala, Alind Khare, Animesh Agrawal, Igor Fedorov, Hugo M Latapie, Myungjin Lee, Alexey Tumanov EA-VTR: Event-Aware Video-Text Retrieval
Zongyang Ma, Ziqi Zhang, Yuxin Chen, Zhongang Qi, Chunfeng Yuan, Bing Li, Yingmin Luo, Xu Li, Xiaojuan Qi, Ying Shan, Weiming Hu Early Anticipation of Driving Maneuvers
Abdul Wasi Lone, Shankar Gangisetty, Shyam Nandan Rai, C. V. Jawahar EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
Guangyao Zhai, Evin Pınar Örnek, Dave Zhenyu Chen, Ruotong Liao, Yan Di, Nassir Navab, Federico Tombari, Benjamin Busam EcoMatcher: Efficient Clustering Oriented Matcher for Detector-Free Image Matching
Peiqi Chen, Lei Yu, Yi Wan, Yongjun Zhang, Jian Wang, Liheng Zhong, Jingdong Chen, Ming Yang Editable Image Elements for Controllable Synthesis
Jiteng Mu, Michaël Gharbi, Richard Zhang, Eli Shechtman, Nuno Vasconcelos, Xiaolong Wang, Taesung Park Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer
Qinji Yu, Yirui Wang, Ke Yan, Haoshen Li, Dazhou Guo, Li Zhang, Na Shen, Qifeng Wang, Xiaowei Ding, Le Lu, Xianghua Ye, Dakai Jin Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning
Amandeep Kumar, Muhammad Awais, Sanath Narayan, Hisham Cholakkal, Salman Khan, Rao Muhammad Anwer Efficient Bias Mitigation Without Privileged Information
Mateo Espinosa Zarlenga, Swami Sankaranarayanan, Jerone T. A. Andrews, Zohreh Shams, Mateja Jamnik, Alice Xiang Efficient Depth-Guided Urban View Synthesis
Sheng Miao, Jiaxin Huang, Dongfeng Bai, Weichao Qiu, Liu Bingbing, Andreas Geiger, Yiyi Liao Efficient Diffusion Transformer with Step-Wise Dynamic Attention Mediators
Yifan Pu, Zhuofan Xia, Jiayi Guo, Dongchen Han, Qixiu Li, Duo Li, Yuhui Yuan, Ji Li, Yizeng Han, Shiji Song, Gao Huang, Xiu Li Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation
Yeongtak Oh, Jonghyun Lee, Jooyoung Choi, Dahuin Jung, Uiwon Hwang, Sungroh Yoon Efficient Few-Shot Action Recognition via Multi-Level Post-Reasoning
Cong Wu, Xiao-Jun Wu, Linze Li, Tianyang Xu, Zhenhua Feng, Josef Kittler Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
Alexandre Eymaël, Renaud Vandeghen, Anthony Cioppa, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Zuyan Liu, Benlin Liu, Jiahui Wang, Yuhao Dong, Guangyi Chen, Yongming Rao, Ranjay Krishna, Jiwen Lu Efficient NeRF Optimization - Not All Samples Remain Equally Hard
Juuso Korhonen, Goutham Rangu, Hamed Rezazadegan Tavakoli, Juho Kannala Efficient Pre-Training for Localized Instruction Generation of Procedural Videos
Anil Batra, Davide Moltisanti, Laura Sevilla-Lara, Marcus Rohrbach, Frank Keller Efficient Training with Denoised Neural Weights
Yifan Gong, Zheng Zhan, Yanyu Li, Yerlan Idelbayev, Andrey Zharkov, Kfir Aberman, Sergey Tulyakov, Yanzhi Wang, Jian Ren Efficient Vision Transformers with Partial Attention
Xuan-Thuy Vo, Duy-Linh Nguyen, Adri Priadana, Kang-Hyun Jo EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation
Nikolai Körber, Eduard Kromer, Andreas Siebert, Sascha Hauke, Daniel Mueller-Gritschneder, Björn Schuller EgoBody3M: Egocentric Body Tracking on a VR Headset Using a Diverse Dataset
Amy Zhao, Chengcheng Tang, Lezi Wang, Yijing Li, Mihika Dave, Lingling Tao, Christopher D. Twigg, Robert Y. Wang EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
Thomas Hummel, Shyamgopal Karthik, Mariana-Iuliana Georgescu, Zeynep Akata EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding
Yuan-Ming Li, Wei-Jin Huang, An-Lan Wang, Ling-An Zeng, Jing-Ke Meng, Wei-Shi Zheng EgoLifter: Open-World 3D Segmentation for Egocentric Perception
Qiao Gu, Zhaoyang Lv, Duncan Frost, Simon Green, Julian Straub, Chris Sweeney EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Amir Bar, Arya Bakhtiar, Danny L Tran, Antonio Loquercio, Jathushan Rajasegaran, Yann Lecun, Amir Globerson, Trevor Darrell EgoPoseFormer: A Simple Baseline for Stereo Egocentric 3D Human Pose Estimation
Chenhongyi Yang, Anastasia Tkach, Shreyas Hampali, Linguang Zhang, Elliot J Crowley, Cem Keskin EINet: Point Cloud Completion via Extrapolation and Interpolation
Pingping Cai, Canyu Zhang, Lingjia Shi, Lili Wang, Nasrin Imanpour, Song Wang Eliminating Feature Ambiguity for Few-Shot Segmentation
Qianxiong Xu, Guosheng Lin, Chen Change Loy, Cheng Long, Ziyue Li, Rui Zhao Eliminating Warping Shakes for Unsupervised Online Video Stitching
Lang Nie, Chunyu Lin, Kang Liao, Yun Zhang, Shuaicheng Liu, Rui Ai, Yao Zhao ELSE: Efficient Deep Neural Network Inference Through Line-Based Sparsity Exploration
Zeqi Zhu, Alberto Garcia-Ortiz, Luc Waeijen, Egor Bondarev, Arash Pourtaherian, Orlando Moreira Embodied Understanding of Driving Scenarios
Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation
Wenyang Zhou, Zhiyang Dou, Zeyu Cao, Zhouyingcheng Liao, Jingbo Wang, Wenjia Wang, Yuan Liu, Taku Komura, Wenping Wang, Lingjie Liu Emerging Property of Masked Token for Effective Pre-Training
Hyesong Choi, Hunsang Lee, Seyoung Joung, Hyejin Park, Jiyeong Kim, Dongbo Min EMIE-MAP: Large-Scale Road Surface Reconstruction Based on Explicit Mesh and Implicit Encoding
Wenhua Wu, Qi Wang, Guangming Wang, Junping Wang, Tiankun Zhao, Yang Liu, Dongchao Gao, Zhe Liu, Hesheng Wang EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head
Qianyun He, Xinya Ji, Yicheng Gong, Yuanxun Lu, Zhengyu Diao, Linjia Huang, Yao Yao, Siyu Zhu, Zhan Ma, Songcen Xu, Xiaofei Wu, Zixiao Zhang, Xun Cao, Hao Zhu End-to-End Rate-Distortion Optimized 3D Gaussian Representation
Henan Wang, Hanxin Zhu, Tianyu He, Runsen Feng, Jiajun Deng, Jiang Bian, Zhibo Chen Energy-Clibrated VAE with Test Time Free Lunch
Yihong Luo, Siya Qiu, Xingjian Tao, Yujun Cai, Jing Tang Energy-Induced Explicit Quantification for Multi-Modality MRI Fusion
Xiaoming Qi, Yuan Zhang, Tong Wang, Guanyu Yang, Yueming Jin, Shuo Li Enhanced Motion Forecasting with Visual Relation Reasoning
Sungjune Kim, Hadam Baek, Seunggwan Lee, Hyung-gun Chi, Hyerin Lim, Jinkyu Kim, Sangpil Kim Enhanced Sparsification via Stimulative Training
Shengji Tang, Weihao Lin, Hancheng Ye, Peng Ye, Chong Yu, Baopu Li, Tao Chen Enhancing Cross-Subject fMRI-to-Video Decoding with Global-Local Functional Alignment
Chong Li, Xuelin Qian, Yun Wang, Jingyang Huo, Xiangyang Xue, Yanwei Fu, Jianfeng Feng Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
Chaofeng Chen, Annan Wang, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, Weisi Lin Enhancing Tampered Text Detection Through Frequency Feature Fusion and Decomposition
Zhongxi Chen, Shen Chen, Taiping Yao, Ke Sun, Shouhong Ding, Xianming Lin, Liujuan Cao, Rongrong Ji Enhancing Vectorized mAP Perception with Historical Rasterized Maps
Xiaoyu Zhang, Guangwei Liu, Zihao Liu, Ningyi Xu, Yunhui Liu, Ji Zhao Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection
Deepti Hegde, Suhas Lohit, Kuan-Chuan Peng, Michael J. Jones, Vishal M. Patel EraseDraw : Learning to Insert Objects by Erasing Them from Images
Alper Canberk, Maksym Bondarenko, Ege Ozguroglu, Ruoshi Liu, Carl Vondrick Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, Deva Ramanan Event-Adapted Video Super-Resolution
Zeyu Xiao, Dachun Kai, Yueyi Zhang, Zheng-Jun Zha, Xiaoyan Sun, Zhiwei Xiong Event-Aided Time-to-Collision Estimation for Autonomous Driving
Jinghang Li, Bangyan Liao, Xiuyuan Lu, Peidong Liu, Shaojie Shen, Yi Zhou Event-Based Head Pose Estimation: Benchmark and Method
Jiahui Yuan, Hebei Li, Yansong Peng, Jin Wang, Yuheng Jiang, Yueyi Zhang, Xiaoyan Sun Event-Based Motion Magnification
Yutian Chen, Shi Guo, Yu Fangzheng, Feng Zhang, Jinwei Gu, Tianfan Xue EvSign: Sign Language Recognition and Translation with Streaming Events
Pengyu Zhang, Hao Yin, Zeren Wang, Wenyue Chen, Sheng Ming Li, Dong Wang, Huchuan Lu, Xu Jia Exemplar-Free Continual Representation Learning via Learnable Drift Compensation
Alex Gomez-Villa, Dipam Goswami, Kai Wang, Andy Bagdanov, Bartlomiej Twardowski, Joost van de Weijer Explorative Inbetweening of Time and Space
Haiwen Feng, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Fernandez Abrevaya, Michael J. Black, Xuaner Zhang Exploring Guided Sampling of Conditional GANs
Yifei Zhang, Mengfei Xia, Yujun Shen, Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng, Lianghua Huang, Yu Liu, Fan Cheng Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
Danni Yang, Ruohan Dong, Jiayi Ji, Yiwei Ma, Haowei Wang, Xiaoshuai Sun, Rongrong Ji Expressive Whole-Body 3D Gaussian Avatar
Gyeongsik Moon, Takaaki Shiratori, Shunsuke Saito External Knowledge Enhanced 3D Scene Generation from Sketch
Zijie Wu, Mingtao Feng, Yaonan Wang, He Xie, Weisheng Dong, Bo Miao, Ajmal Mian Eyes Closed, Safety on: Protecting Multimodal LLMs via Image-to-Text Transformation
Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James Kwok, Yu Zhang Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
Yue Han, Junwei Zhu, Keke He, Xu Chen, Yanhao Ge, Wei Li, Xiangtai Li, Jiangning Zhang, Chengjie Wang, Yong Liu Face Reconstruction Transfer Attack as Out-of-Distribution Generalization
Yoon Gyo Jung, Jaewoo Park, Xingbo Dong, Hojin Park, Andrew Beng Jin Teoh, Octavia Camps Faceptor: A Generalist Model for Face Perception
Lixiong Qin, Mei Wang, Xuannan Liu, Yuhang Zhang, Wei Deng, Xiaoshuai Song, Weiran Xu, Weihong Deng Facial Affective Behavior Analysis with Instruction Tuning
Yifan Li, Anh Dao, Wentao Bao, Zhen Tan, Tianlong Chen, Huan Liu, Yu Kong Factorizing Text-to-Video Generation by Explicit Image Conditioning
Rohit Girdhar, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Mian Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification
Yu Tian, Congcong Wen, Min Shi, Muhammad Muneeb Afzal, Hao Huang, Muhammad Osama Khan, Yan Luo, Yi Fang, Mengyu Wang FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
Jiedong Zhuang, Jiaqi Hu, Lianrui Mu, Rui Hu, Xiaoyu Liang, Jiangnan Ye, Haoji Hu FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis
Vishnu Mani Hema, Shubhra Aich, Christian Haene, Jean-Charles Bazin, Fernando de la Torre Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation
Nina Weng, Paraskevas Pegios, Eike Petersen, Aasa Feragen, Siavash Arjomand Bigdeli Fast Registration of Photorealistic Avatars for VR Facial Animation
Chaitanya Patel, Shaojie Bai, Te-Li Wang, Jason Saragih, Shih-En Wei Fast Sprite Decomposition from Animated Graphics
Tomoyuki Suzuki, Kotaro Kikuchi, Kota Yamaguchi Fast View Synthesis of Casual Videos with Soup-of-Planes
Yao-Chih Lee, Zhoutong Zhang, Kevin Blackburn-Matzen, Simon Niklaus, Jianming Zhang, Jia-Bin Huang, Feng Liu FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos
Florian Maximilian Langer, Jihong Ju, Georgi Dikov, Gerhard Reitmayr, Mohsen Ghafoorian Feature Diversification and Adaptation for Federated Domain Generalization
Seunghan Yang, Seokeon Choi, Hyunsin Park, Sungha Choi, Simyung Chang, Sungrack Yun FedHARM: Harmonizing Model Architectural Diversity in Federated Learning
Anestis Kastellos, Athanasios Psaltis, Charalampos Z Patrikakis, Petros Daras Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeff Nichols, Yinfei Yang, Zhe Gan Few-Shot Class Incremental Learning with Attention-Aware Self-Adaptive Prompt
Chenxi Liu, Zhenyi Wang, Tianyi Xiong, Ruibo Chen, Yihan Wu, Junfeng Guo, Heng Huang Few-Shot NeRF by Adaptive Rendering Loss Regularization
Qingshan Xu, Xuanyu Yi, Jianyao Xu, Wenbing Tao, Yew Soon Ong, Hanwang Zhang Finding Visual Task Vectors
Alberto Hojel, Yutong Bai, Trevor Darrell, Amir Globerson, Amir Bar Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction
Yansheng Li, Tingzhu Wang, Kang Wu, Linlin Wang, Xin Guo, Wenbin Wang FineMatch: Aspect-Based Fine-Grained Image and Text Mismatch Detection and Correction
Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo Flash Cache: Reducing Bias in Radiance Cache Based Inverse Rendering
Benjamin Attal, Dor Verbin, Ben Mildenhall, Peter Hedman, Jonathan T Barron, Matthew O'Toole, Pratul Srinivasan Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats
Mingyang Xie, Haoming Cai, Sachin Shah, Yiran Xu, Brandon Y. Feng, Jia-Bin Huang, Christopher A. Metzler FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Kangle Deng, Timothy Omernick, Alexander B Weiss, Deva Ramanan, Jun-Yan Zhu, Tinghui Zhou, Maneesh Agrawala FLAT: Flux-Aware Imperceptible Adversarial Attacks on 3D Point Clouds
Keke Tang, Lujie Huang, Weilong Peng, Daizong Liu, Xiaofei Wang, Yang Ma, Ligang Liu, Zhihong Tian FlexAttention for Efficient High-Resolution Vision-Language Models
Junyan Li, Delin Chen, Tianle Cai, Peihao Chen, Yining Hong, Zhenfang Chen, Yikang Shen, Chuang Gan Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition
Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Jinyoung Park, Yooseung Wang, Donguk Kim, Changick Kim Flowed Time of Flight Radiance Fields
Mikhail Okunev, Marc Mapeke, Benjamin Attal, Christian Richardt, Matthew O'Toole, James Tompkin Flying with Photons: Rendering Novel Views of Propagating Light
Anagh Malik, Noah Juravsky, Ryan Po, Gordon Wetzstein, Kiriakos N. Kutulakos, David B. Lindell FMBoost: Boosting Latent Diffusion with Flow Matching
Johannes S Fischer, Ming Gui, Pingchuan Ma, Nick Stracke, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection
Jianwei Zhao, Xin Li, Fan Yang, Qiang Zhai, Ao Luo, Zhicheng Jiao, Hong Cheng Formula-Supervised Visual-Geometric Pre-Training
Ryosuke Yamada, Kensho Hara, Hirokatsu Kataoka, Koshi Makihara, Nakamasa Inoue, Rio Yokota, Yutaka Satoh Foster Adaptivity and Balance in Learning with Noisy Labels
Mengmeng Sheng, Zeren Sun, Tao Chen, Shuchao Pang, Yucheng Wang, Yazhou Yao FoundPose: Unseen Object Pose Estimation with Foundation Features
Evin Pınar Örnek, Yann Labbé, Bugra Tekin, Lingni Ma, Cem Keskin, Christian Forster, Tomas Hodan FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li Free Lunch for Gait Recognition: A Novel Relation Descriptor
Jilong Wang, Saihui Hou, Yan Huang, Chunshui Cao, Xu Liu, Yongzhen Huang, Tianzhu Zhang, Liang Wang Free-ATM: Harnessing Free Attention Masks for Representation Learning on Diffusion-Generated Images
David Junhao Zhang, Mutian Xu, Jay Zhangjie Wu, Chuhui Xue, Wenqing Zhang, Xiaoguang Han, Song Bai, Mike Zheng Shou Free-Editor: Zero-Shot Text-Driven 3D Scene Editing
Nazmul Karim, Hasan Iqbal, Umar Khalid, Chen Chen, Jing Hua FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
Zhekai Chen, Wen Wang, Zhen Yang, Zeqing Yuan, Hao Chen, Chunhua Shen FreeMotion: A Unified Framework for Number-Free Text-to-Motion Synthesis
Ke Fan, Junshu Tang, Weijian Cao, Ran Yi, Moran Li, Jingyu Gong, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Lizhuang Ma FreestyleRet: Retrieving Images from Style-Diversified Queries
Hao Li, Yanhao Jia, Peng Jin, Zesen Cheng, Kehan Li, Jialu Sui, Chang Liu, Li Yuan Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval
Aneeshan Sain, Pinaki Nath Chowdhury, Subhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe Song FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation
Chenliang Zhou, Fangcheng Zhong, Param Hanji, Zhilin Guo, Kyle Thomas Fogarty, Alejandro Sztrajman, Hongyun Gao, A. Cengiz Oztireli FSD-BEV: Foreground Self-Distillation for Multi-View 3D Object Detection
Zheng Jiang, Jinqing Zhang, Yanan Zhang, Qingjie Liu, Zhenghui Hu, Baohui Wang, Yunhong Wang Fully Authentic Visual Question Answering Dataset from Online Communities
Chongyan Chen, Mengchen Liu, Noel C Codella, Yunsheng Li, Lu Yuan, Danna Gurari Fully Sparse 3D Occupancy Prediction
Haisong Liu, Yang Chen, Haiguang Wang, Zetong Yang, Tianyu Li, Jia Zeng, Li Chen, Hongyang Li, Limin Wang Fundamental Matrix Estimation Using Relative Depths
Yaqing Ding, Václav Vávra, Snehal Bhayani, Qianliang Wu, Jian Yang, Zuzana Kukelova FunQA: Towards Surprising Video Comprehension
Binzhu Xie, Sicheng Zhang, Zitang Zhou, Bo Li, Yuanhan Zhang, Jack Hessel, Jingkang Yang, Ziwei Liu FuseTeacher: Modality-Fused Encoders Are Strong Vision Supervisors
Chen-Wei Xie, Siyang Sun, Liming Zhao, Pandeng Li, Shuailei Ma, Yun Zheng FutureDepth: Learning to Predict the Future Improves Video Depth Estimation
Rajeev Yasarla, Manish Kumar Singh, Hong Cai, Yunxiao Shi, Jisoo Jeong, Yinhao Zhu, Shizhong Han, Risheek Garrepalli, Fatih Porikli FYI: Flip Your Images for Dataset Distillation
Byunggwan Son, Youngmin Oh, Donghyeon Baek, Bumsub Ham G2fR: Frequency Regularization in Grid-Based Feature Encoding Neural Radiance Fields
Shuxiang Xie, Shuyi Zhou, Ken Sakurada, Ryoichi Ishikawa, Masaki Onishi, Takeshi Oishi G3R: Gradient Guided Generalizable Reconstruction
Yun Chen, Jingkang Wang, Ze Yang, Sivabalan Manivasagam, Raquel Urtasun GalLop: Learning Global and Local Prompts for Vision-Language Models
Marc Lafon, Elias Ramzi, Clément Rambour, Nicolas Audebert, Nicolas Thome GarmentAligner: Text-to-Garment Generation via Retrieval-Augmented Multi-Level Corrections
Shiyue Zhang, Zheng Chong, Xujie Zhang, Hanhui Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang GarmentCodeData: A Dataset of 3D Made-to-Measure Garments with Sewing Patterns
Maria Korosteleva, Timur Levent Kesdogan, Fabian Kemper, Stephan Wenninger, Jasmin Koller, Yuhan Zhang, Mario Botsch, Olga Sorkine-Hornung Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation
Olga Zatsarynna, Emad Bahrami, Yazan Abu Farha, Gianpiero Francesca, Jürgen Gall GAURA: Generalizable Approach for Unified Restoration and Rendering of Arbitrary Views
Vinayak Gupta, Rongali Simhachala Venkata Girish, Mukund Varma T, Ayush Tewari, Kaushik Mitra GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing
Jing Wu, Jia-Wang Bian, Xinghui Li, Guangrun Wang, Ian Reid, Philip Torr, Victor Adrian Prisacariu Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections
Dongbin Zhang, Chuming Wang, Weitao Wang, Peihao Li, Minghan Qin, Haoqian Wang Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion
Otto Seiskari, Jerry Ylilammi, Valtteri Kaatrasalo, Pekka Rantalankila, Matias Turkulainen, Juho Kannala, Esa Rahtu, Arno Solin GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
Xinjie Zhang, Xingtong Ge, Tongda Xu, Dailan He, Yan Wang, Hongwei Qin, Guo Lu, Jing Geng, Jun Zhang GaussReg: Fast 3D Registration with Gaussian Splatting
Jiahao Chang, Yinglin Xu, Yihao Li, Yuantao Chen, Wensen Feng, Xiaoguang Han General and Task-Oriented Video Segmentation
Mu Chen, Liulei Li, Wenguan Wang, Ruijie Quan, Yi Yang General Geometry-Aware Weakly Supervised 3D Object Detection
Guowen Zhang, Junsong Fan, Liyi Chen, Zhaoxiang Zhang, Zhen Lei, Lei Zhang GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features
Luc P.J. Sträter, Mohammadreza Salehi, Efstratios Gavves, Cees G.M. Snoek, Yuki M. Asano Generalizable Facial Expression Recognition
Yuhang Zhang, Xiuqi Zheng, Chenyi Liang, Jiani Hu, Weihong Deng Generalizable Human Gaussians for Sparse View Synthesis
YoungJoong Kwon, Baole Fang, Yixing Lu, Haoye Dong, Cheng Zhang, Francisco Vicente Carrasco, Albert Mosella-Montoro, Jianjin Xu, Shingo J Takagi, Daeil Kim, Aayush Prakash, Fernando de la Torre Generalizable Symbolic Optimizer Learning
Xiaotian Song, Peng Zeng, Yanan Sun, Andy Song GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
Ibrahim Ethem Hamamci, Sezgin Er, Anjany Sekuboyina, Enis Simsar, Alperen Tezcan, Ayse Gulnihan Simsek, Sevval Nil Esirgun, Furkan Almas, Irem Dogan, Muhammed Furkan Dasdelen, Chinmay Prabhakar, Hadrien Reynaud, Sarthak Pati, Christian Bluethgen, Mehmet Kemal Ozdemir, Bjoern Menze Generating 3D House Wireframes with Semantics
Xueqi Ma, Yilin Liu, Wenjun Zhou, Ruowei Wang, Hui Huang Generating Human Interaction Motions in Scenes with Text Control
Hongwei Yi, Justus Thies, Michael J. Black, Xue Bin Peng, Davis Rempe Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
Basile Van Hoorick, Rundi Wu, Ege Ozguroglu, Kyle Sargent, Ruoshi Liu, Pavel Tokmakov, Achal Dave, Changxi Zheng, Carl Vondrick Generative End-to-End Autonomous Driving
Wenzhao Zheng, Ruiqi Song, Xianda Guo, Chenming Zhang, Long Chen GenQ: Quantization in Low Data Regimes with Generative Synthetic Data
Yuhang Li, Youngeun Kim, Donghyun Lee, Souvik Kundu, Priyadarshini Panda GenRC: Generative 3D Room Completion from Sparse Image Collections
Ming-Feng Li, Yueh-Feng Ku, Hong-Xuan Yen, Chi Liu, Yu-Lun Liu, Albert Y Chen, Cheng-Hao Kuo, Min Sun GeoCalib: Learning Single-Image Calibration with Geometric Optimization
Alexander Veicht, Paul-Edouard Sarlin, Philipp Lindenberger, Marc Pollefeys GeoGaussian: Geometry-Aware Gaussian Splatting for Scene Rendering
Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari Geometry Fidelity for Spherical Images
Anders Christensen, Nooshin Mojab, Khushman Patel, Karan Ahuja, Zeynep Akata, Ole Winther, Mar Gonzalez Franco, Andrea Colaco GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, Xiaoxiao Long Getting It Right: Improving Spatial Consistency in Text-to-Image Models
Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Guez Aflalo, Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hanna Hajishirzi, Vasudev Lal, Chitta R Baral, Yezhou Yang GGRt: Towards Generalizable 3D Gaussians Without Pose Priors in Real-Time
Hao Li, Yuanyuan Gao, Dingwen Zhang, Chenming Wu, Yalun Dai, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han GiT: Towards Generalist Vision Transformer Through Universal Language Interface
Haiyang Wang, Hao Tang, Li Jiang, Shaoshuai Shi, Muhammad Ferjad Naeem, Hongsheng Li, Bernt Schiele, Liwei Wang GIVT: Generative Infinite-Vocabulary Transformers
Michael Tschannen, Cian Eastwood, Fabian Mentzer Global Counterfactual Directions
Bartłomiej Sobieski, Przemyslaw Biecek Global Structure-from-Motion Revisited
Linfei Pan, Daniel Barath, Marc Pollefeys, Johannes L Schönberger Global-Local Collaborative Inference with LLM for LiDAR-Based Open-Vocabulary Detection
Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation
Bangyan Liao, Zhenjun Zhao, Lu Chen, Haoang Li, Daniel Cremers, Peidong Liu Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, Yuhui Yuan GMM-IKRS: Gaussian Mixture Models for Interpretable Keypoint Refinement and Scoring
Emanuele Santellani, Martin Zach, Christian Sormann, Mattia Rossi, Andreas Kuhn, Friedrich Fraundorfer GOEmbed: Gradient Origin Embeddings for Representation Agnostic 3D Feature Learning
Animesh Karnewar, Roman Shapovalov, Tom Monnier, Andrea Vedaldi, Niloy J. Mitra, David Novotny Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
Kirolos Ataallah, Xiaoqian Shen, Eslam mohamed Abdelrahman, Essam Sleiman, Mingchen Zhuge, Jian Ding, Deyao Zhu, Jürgen Schmidhuber, Mohamed Elhoseiny GPSFormer: A Global Perception and Local Structure Fitting-Based Transformer for Point Cloud Understanding
Changshuo Wang, Meiqing Wu, Siew-Kei Lam, Xin Ning, Shangshu Yu, Ruiping Wang, Weijun Li, Thambipillai Srikanthan GRA: Detecting Oriented Objects Through Group-Wise Rotating and Attention
Jiangshan Wang, Yifan Pu, Yizeng Han, Jiayi Guo, Yiru Wang, Xiu Li, Gao Huang Gradient-Based Out-of-Distribution Detection
Taha Entesari, Sina Sharifi, Bardia Safaei, Vishal Patel, Mahyar Fazlyab GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection
Ziying Song, Lei Yang, Shaoqing Xu, Lin Liu, Dongyang Xu, Caiyan Jia, Feiyang Jia, Li Wang GRiT: A Generative Region-to-Text Transformer for Object Understanding
Jialian Wu, Jianfeng Wang, Zhengyuan Yang, Zhe Gan, Zicheng Liu, Junsong Yuan, Lijuan Wang GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein GroCo: Ground Constraint for Metric Self-Supervised Monocular Depth
Aurélien Cecille, Stefan Duffner, Franck Davoine, Thibault Neveu, Rémi Agier Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang Grounding Image Matching in 3D with MASt3R
Vincent Leroy, Yohann Cabon, Jerome Revaud Grounding Language Models for Visual Entity Recognition
Zilin Xiao, Ming Gong, Paola Cascante-Bonilla, Xingyao Zhang, Jie Wu, Vicente Ordonez GroundUp: Rapid Sketch-Based 3D City Massing
Gizem Esra Unlu, Mohamed Sayed, Yulia Gryaditskaya, Gabriel Brostow GroupDiff: Diffusion-Based Group Portrait Editing
Yuming Jiang, Nanxuan Zhao, Qing Liu, Krishna Kumar Singh, Shuai Yang, Chen Change Loy, Ziwei Liu GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, Zexiang Xu GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction
Yuxuan Mu, Xinxin Zuo, Chuan Guo, Yilin Wang, Juwei Lu, Xiaofei Wu, Songcen Xu, Peng Dai, Youliang Yan, Li Cheng GVGEN: Text-to-3D Generation with Volumetric Representation
Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang, Garrett Bingham, Adams Wei Yu, Quoc V. Le, Thang Luong, Golnaz Ghiasi HARIVO: Harnessing Text-to-Image Models for Video Generation
Mingi Kwon, Seoung Wug Oh, Yang Zhou, Joon-Young Lee, Difan Liu, Haoran Cai, Baqiao Liu, Feng Liu, Youngjung Uh Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360°
Yuxiao He, Yiyu Zhuang, Yanwen Wang, Yao Yao, Siyu Zhu, Xiaoyu Li, Qi Zhang, Xun Cao, Hao Zhu HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting
Helisa Dhamo, Yinyu Nie, Arthur Moreau, Jifei Song, Richard Shaw, Yiren Zhou, Eduardo Pérez-Pellitero HENet: Hybrid Encoding for End-to-End Multi-Task 3D Perception from Multi-View Cameras
Zhongyu Xia, ZhiWei Lin, Xinhao Wang, Yongtao Wang, Yun Xing, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang Hetecooper: Feature Collaboration Graph for Heterogeneous Collaborative Perception
Congzhang Shao, Guiyang Luo, Quan Yuan, Yifu Chen, Yilin Liu, Gong Kexin, Jinglin Li Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for Studying Species Evolution
Mridul Khurana, Arka Daw, M. Maruf, Josef C. Uyeda, Wasila Dahdul, Caleb Charpentier, Yasin Bakış, Henry L. Bart Jr., Paula M. Mabee, Hilmar Lapp, James P. Balhoff, Wei-Lun Chao, Charles Stewart, Tanya Berger-Wolf, Anuj Karpatne Hierarchical Temporal Context Learning for Camera-Based Semantic Scene Completion
Bohan Li, Jiajun Deng, Wenyao Zhang, Zhujin Liang, Dalong Du, Xin Jin, Wenjun Zeng HiFi-123: Towards High-Fidelity One Image to 3D Content Generation
Wangbo Yu, Li Yuan, Yan-Pei Cao, Xiangjun Gao, Xiaoyu Li, Wenbo Hu, Long Quan, Ying Shan, Yonghong Tian High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding
Qi Zuo, Xiaodong Gu, Yuan Dong, Zhengyi Zhao, Weihao Yuan, Qiu Lingteng, Liefeng Bo, Zilong Dong High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang High-Fidelity Modeling of Generalizable Wrinkle Deformation
Jingfan Guo, Jae Shin Yoon, Shunsuke Saito, Takaaki Shiratori, Hyun Soo Park HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects
Xintao Lv, Liang Xu, Yichao Yan, Xin Jin, Congsheng Xu, Wu Shuwen, Yifan Liu, Lincheng Li, Mengxiao Bi, Wenjun Zeng, Xiaokang Yang HoloADMM: High-Quality Holographic Complex Field Recovery
Mazen Mel, Paul Springer, Pietro Zanuttigh, Haitao Zhou, Alexander Gatto How Many Unicorns Are in This Image? a Safety Evaluation Benchmark for Vision LLMs
Haoqin Tu, Chenhang Cui, Zijun Wang, Yiyang Zhou, Bingchen Zhao, Junlin Han, Wangchunshu Zhou, Huaxiu Yao, Cihang Xie How to Train the Teacher Model for Effective Knowledge Distillation
Shayan Mohajer Hamidi, Xizhen Deng, Renhao Tan, Linfeng Ye, Ahmed Hussein Salamah How Video Meetings Change Your Expression
Sumit Sarin, Utkarsh Mall, Purva Tendulkar, Carl Vondrick HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Nina Shvetsova, Anna Kukleva, Xudong Hong, Christian Rupprecht, Bernt Schiele, Hilde Kuehne HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion
Junhao Su, Chenghao He, Feiyu Zhu, Xiaojie Xu, Dongzhi Guan, Chenyang Si HSR: Holistic 3D Human-Scene Reconstruction from Monocular Videos
Lixin Xue, Chen Guo, Chengwei Zheng, Fangjinhua Wang, Tianjian Jiang, Hsuan-I Ho, Manuel Kaufmann, Jie Song, Otmar Hilliges Human Hair Reconstruction with Strand-Aligned 3D Gaussians
Egor Zakharov, Vanessa Sklyarova, Michael J. Black, Giljoo Nam, Justus Thies, Otmar Hilliges Human-in-the-Loop Visual Re-ID for Population Size Estimation
Gustavo Perez, Daniel Sheldon, Grant Van Horn, Subhransu Maji HUMOS: Human Motion Model Conditioned on Body Shape
Shashank Tripathi, Omid Taheri, Christoph Lassner, Michael J. Black, Daniel Holden, Carsten Stoll Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation
Kihong Kim, Haneol Lee, Jihye Park, Seyeon Kim, Kwang Hee Lee, Seungryong Kim, Jaejun Yoo HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
Fucai Ke, Zhixi Cai, Simindokht Jahangard, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi Hypernetworks for Generalizable BRDF Representation
Fazilet Gokbudak, Alejandro Sztrajman, Chenliang Zhou, Fangcheng Zhong, Rafal Mantiuk, A. Cengiz Oztireli HyperSpaceX: Radial and Angular Exploration of HyperSpherical Dimensions
Chiranjeev Chiranjeev, Muskan Dosi, Kartik Thakral, Mayank Vatsa, Richa Singh I Can't Believe It's Not Scene Flow!
Ishan Khatri, Kyle Vedder, Neehar Peri, Deva Ramanan, James Hays I-MedSAM: Implicit Medical Image Segmentation with Segment Anything
Xiaobao Wei, Jiajun Cao, Yizhu Jin, Ming Lu, Guangyu Wang, Shanghang Zhang I2-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM
Gwangtak Bae, Changwoon Choi, Hyeongjun Heo, Sang Min Kim, Young Min Kim Idea2Img: Iterative Self-Refinement with GPT-4V for Automatic Image Design and Generation
Zhengyuan Yang, Jianfeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang IG Captioner: Information Gain Captioners Are Strong Zero-Shot Classifiers
Chenglin Yang, Siyuan Qiao, Yuan Cao, Yu Zhang, Tao Zhu, Alan Yuille, Jiahui Yu iHuman: Instant Animatable Digital Humans from Monocular Videos
Pramish Paudel, Anubhav Khanal, Danda Pani Paudel, Jyoti Tandukar, Ajad Chhatkuli Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation
Han Li, Shaohui Li, Shuangrui Ding, Wenrui Dai, Maida Cao, Chenglin Li, Junni Zou, Hongkai Xiong Image Demoireing in RAW and sRGB Domains
Shuning Xu, Binbin Song, Xiangyu Chen, Xina Liu, Jiantao Zhou Imaging with Confidence: Uncertainty Quantification for High-Dimensional Undersampled MR Images
Frederik Hoppe, Claudio Mayrink Verdun, Hannah Sophie Laus, Sebastian Endt, Marion Irene Menzel, Felix Krahmer, Holger Rauhut iMatching: Imperative Correspondence Learning
Zitong Zhan, Dasong Gao, Yun-Jou Lin, Youjie Xia, Chen Wang Implicit Concept Removal of Diffusion Models
Zhili Liu, Kai Chen, Yifan Zhang, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James Kwok Implicit Neural Models to Extract Heart Rate from Video
Pradyumna Chari, Anirudh Bindiganavale Harish, Adnan Armouti, Alexander Vilesov, Sanjit Sarda, Laleh Jalilian, Achuta Kadambi Implicit Style-Content Separation Using B-LoRA
Yarden Frenkel, Yael Vinker, Ariel Shamir, Danny Cohen-Or Improving 2D Feature Representations by 3D-Aware Fine-Tuning
Yuanwen Yue, Anurag Das, Francis Engelmann, Siyu Tang, Jan Eric Lenssen Improving Adversarial Transferability via Model Alignment
Avery Ma, Amir-massoud Farahmand, Yangchen Pan, Philip Torr, Jindong Gu Improving Agent Behaviors with RL Fine-Tuning for Autonomous Driving
Zhenghao Peng, Wenjie Luo, Yiren Lu, Tianyi Shen, Cole Gulino, Ari Seff, Justin Fu Improving Diffusion Models for Authentic Virtual Try-on in the Wild
Yisol Choi, Sangkyung Kwak, Kyungmin Lee, Hyungwon Choi, Jinwoo Shin Improving Geo-Diversity of Generated Images with Contextualized Vendi Score Guidance
Reyhane Askari Hemmat, Melissa Hall, Alicia Yi Sun, Candace Ross, Michal Drozdzal, Adriana Romero-Soriano Improving Text-Guided Object Inpainting with Semantic Pre-Inpainting
Yifu Chen, Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Zhineng Chen, Tao Mei Improving Unsupervised Domain Adaptation: A Pseudo-Candidate Set Approach
Aveen Dayal, Rishabh Lalla, Linga Reddy Cenkeramaddi, C. Krishna Mohan, Abhinav Kumar, Vineeth N Balasubramanian Improving Video Segmentation via Dynamic Anchor Queries
Yikang Zhou, Tao Zhang, Xiangtai Li, Shunping Ji, Shuicheng Yan Improving Virtual Try-on with Garment-Focused Diffusion Models
Siqi Wan, Yehao Li, Jingwen Chen, Yingwei Pan, Ting Yao, Yang Cao, Tao Mei iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning
Tom Fischer, Yaoyao Liu, Artur Jesslen, Noor Ahmed, Prakhar Kaushik, Angtian Wang, Alan Yuille, Adam Kortylewski, Eddy Ilg Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer.
Zhuoyi Yang, Heyang Jiang, Wenyi Hong, Jiayan Teng, Wendi Zheng, Yuxiao Dong, Ming Ding, Jie Tang InfMAE: A Foundation Model in the Infrared Modality
Fangcen Liu, Chenqiang Gao, Yaming Zhang, Junjie Guo, Jinghao Wang, Deyu Meng Insect Identification in the Wild: The AMI Dataset
Aditya Jain, Fagner Cunha, Michael J Bunsen, Juan Sebastián Cañas, Léonard Pasi, Nathan Pinoy, Flemming Helsing, JoAnne Russo, Marc S Botham, Michael Sabourin, Jonathan Fréchette, Alexandre Anctil, Yacksecari Lopez, Eduardo Navarro, Filonila Pérez, Ana C Zamora, Jose Alejandro Ramirez-Silva, Jonathan Gagnon, Tom A August, Kim Bjerge, Alba Gomez Segura, Marc Belisle, Yves Basset, Kent P McFarland, David B Roy, Toke T Høye, Maxim Larrivee, David Rolnick Instant 3D Human Avatar Generation Using Image Diffusion Models
Nikos Kolotouros, Thiemo Alldieck, Enric Corona, Eduard Gabriel Bazavan, Cristian Sminchisescu InstructGIE: Towards Generalizable Image Editing
Zichong Meng, Changdi Yang, Jun Liu, Hao Tang, Pu Zhao, Yanzhi Wang Instruction Tuning-Free Visual Token Complement for Multimodal LLMs
Dongsheng Wang, Jiequan Cui, Miaoge Li, Wang Lin, Bo Chen, Hanwang Zhang Interactive 3D Object Detection with Prompts
Ruifei Zhang, Xiangru Lin, Wei Zhang, Jincheng Lu, Xuekuan Wang, Xiao Tan, Yingying Li, Errui Ding, Jingdong Wang, Guanbin Li InterFusion: Text-Driven Generation of 3D Human-Object Interaction
Sisi Dai, Wenhao Li, Haowen Sun, Haibin Huang, Chongyang Ma, Hui Huang, Kai Xu, Ruizhen Hu InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu, Yinan He, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, SongZe Li, Hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang Intrinsic Single-Image HDR Reconstruction
Sebastian Dille, Chris Careaga, Yagiz Aksoy Invertible Neural Warp for NeRF
Shin-Fang Chng, Ravi Garg, Hemanth Saratchandran, Simon Lucey Investigating Style Similarity in Diffusion Models
Gowthami Somepalli, Anubhav Gupta, Kamal Gupta, Shramay Palta, Micah Goldblum, Jonas A. Geiping, Abhinav Shrivastava, Tom Goldstein IRGen: Generative Modeling for Image Retrieval
Yidan Zhang, Ting Zhang, Dong Chen, Yujing Wang, Qi Chen, Xing Xie, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Mao Yang, Qingmin Liao, Jingdong Wang, Baining Guo Isomorphic Pruning for Vision Models
Gongfan Fang, Xinyin Ma, Michael Bi Mi, Xinchao Wang KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval
Xianwei Zhuang, Hongxiang Li, Xuxin Cheng, Zhihong Zhu, Yuxin Xie, Yuexian Zou Keypoint Promptable Re-Identification
Vladimir Somers, Alexandre Alahi, Christophe De Vleeschouwer KeypointDETR: An End-to-End 3D Keypoint Detector
Hairong Jin, Yuefan Shen, Jianwen Lou, Kun Zhou, Youyi Zheng KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter
Yifan Zhan, Zhuoxiao Li, Muyao Niu, Zhihang Zhong, Shohei Nobuhara, Ko Nishino, Yinqiang Zheng Kinetic Typography Diffusion Model
Seonmi Park, Inhwan Bae, Seunghyun Shin, Hae-Gon Jeon KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding
Zhihao Xu, Shengjie Gong, Jiapeng Tang, Lingyu Liang, Yining Huang, Haojie Li, Shuangping Huang Label-Anticipated Event Disentanglement for Audio-Visual Video Parsing
Jinxing Zhou, Dan Guo, Yuxin Mao, Yiran Zhong, Xiaojun Chang, Meng Wang Label-Free Neural Semantic Image Synthesis
Jiayi Wang, Kevin A Laube, Yumeng Li, Jan Hendrik Metzen, Shin-I Cheng, Julio Borges, Anna Khoreva Labeled Data Selection for Category Discovery
Bingchen Zhao, Nico Lang, Serge Belongie, Oisin Mac Aodha Lagrangian Hashing for Compressed Neural Field Representations
Shrisudhan Govindarajan, Zeno Sambugaro, Akhmedkhan Shabanov, Towaki Takikawa, Weiwei Sun, Daniel Rebain, Nicola Conci, Kwang Moo Yi, Andrea Tagliasacchi LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
Penghui Du, Yu Wang, Yifan Sun, Luting Wang, Yue Liao, Gang Zhang, Errui Ding, Yan Wang, Jingdong Wang, Si Liu Lane Graph as Path: Continuity-Preserving Path-Wise Modeling for Online Lane Graph Construction
Bencheng Liao, Shaoyu Chen, Bo Jiang, Tianheng Cheng, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance
Toan Nguyen, Minh Nhat Nhat Vu, Baoru Huang, An Dinh Vuong, Quan Vuong, Ngan Le, Thieu Vo, Anh Nguyen Language-Image Pre-Training with Long Captions
Kecheng Zheng, Yifei Zhang, Wei Wu, Fan Lu, Shuailei Ma, Xin Jin, Wei Chen, Yujun Shen LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation
Ruida Zhang, Ziqin Huang, Gu Wang, Chenyangguang Zhang, Yan Di, Xingxing Zuo, Jiwen Tang, Xiangyang Ji LaRa: Efficient Large-Baseline Radiance Fields
Anpei Chen, Haofei Xu, Stefano Esposito, Siyu Tang, Andreas Geiger Large Motion Model for Unified Multi-Modal Motion Generation
Mingyuan Zhang, Daisheng Jin, Chenyang Gu, Fangzhou Hong, Zhongang Cai, Jingfang Huang, Chongzhi Zhang, Xinying Guo, Lei Yang, Ying He, Ziwei Liu Latent Guard: A Safety Framework for Text-to-Image Generation
Runtao Liu, Ashkan Khakzar, Jindong Gu, Qifeng Chen, Philip Torr, Fabio Pizzati LatentEditor: Text Driven Local Editing of 3D Scenes
Umar Khalid, Hasan Iqbal, Muhammad Tayyab, Md Nazmul Karim, Jing Hua, Chen Chen LATTE3D: Large-Scale Amortized Text-to-Enhanced3D Synthesis
Kevin Xie, Tianshi Cao, Jonathan P Lorraine, Jun Gao, James R Lucas, Antonio Torralba, Sanja Fidler, Xiaohui Zeng LaWa: Using Latent Space for In-Generation Image Watermarking
Ahmad Rezaei, Mohammad Akbari, Saeed Ranjbar Alvar, Arezou Fatemi, Yong Zhang Layer-Wise Relevance Propagation with Conservation Property for ResNet
Seitaro Otsuki, Tsumugi Iida, Félix Doublet, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Komei Sugiura LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
Ning Yu, Chia-chih Chen, Zeyuan Chen, Rui Meng, Gang Wu, Paul W Josel, Juan Carlos Niebles, Caiming Xiong, Ran Xu LayoutFlow: Flow Matching for Layout Generation
Julian Jorge Andrade Guerreiro, Naoto Inoue, Kento Masui, Mayu Otani, Hideki Nakayama Lazy Diffusion Transformer for Interactive Image Editing
Yotam Nitzan, Zongze Wu, Richard Zhang, Eli Shechtman, Danny Cohen-Or, Taesung Park, Michaël Gharbi LCM-Lookahead for Encoder-Based Text-to-Image Personalization
Rinon Gal, Or Lichter, Elad Richardson, Or Patashnik, Amit Bermano, Gal Chechik, Danny Cohen-Or Learn to Optimize Denoising Scores: A Unified and Improved Diffusion Prior for 3D Generation
Xiaofeng Yang, Yiwen Chen, Cheng Chen, Chi Zhang, Yi Xu, Xulei Yang, Fayao Liu, Guosheng Lin Learned HDR Image Compression for Perceptually Optimal Storage and Display
Peibei Cao, Haoyu Chen, Jingzhe Ma, Yu-Chieh Yuan, Zhiyong Xie, Xin Xie, Haiqing Bai, Kede Ma Learned Image Enhancement via Color Naming
David Serrano-Lozano, Luis Herranz, Michael S Brown, Javier Vazquez-Corral Learning 3D-Aware GANs from Unposed Images with Template Feature Field
Xinya Chen, Hanlei Guo, Yanrui Bin, Shangzhan Zhang, Yuanbo Yang, Yujun Shen, Yue Wang, Yiyi Liao Learning by Aligning 2D Skeleton Sequences and Multi-Modality Fusion
Quoc-Huy Tran, Muhammad Ahmed, Murad Popattia, Muhammad Hassan Ahmed, Andrey Konin, Zeeshan Zia Learning Camouflaged Object Detection from Noisy Pseudo Label
Jin Zhang, Ruiheng Zhang, Yanjiao Shi, Zhe Cao, Nian Liu, Fahad Shahbaz Khan Learning Diffusion Models for Multi-View Anomaly Detection
Chieh Liu, Yu-Min Chu, Ting-I Hsieh, Hwann-Tzong Chen, Tyng-Luh Liu Learning Neural Volumetric Pose Features for Camera Localization
Jingyu Lin, Jiaqi Gu, Bojian Wu, Lubin Fan, Renjie Chen, Ligang Liu, Jieping Ye Learning Pseudo 3D Guidance for View-Consistent Texturing with 2D Diffusion
Kehan Li, Yanbo Fan, Yang Wu, Zhongqian Sun, Wei Yang, Xiangyang Ji, Li Yuan, Jie Chen Learning Quantized Adaptive Conditions for Diffusion Models
Yuchen Liang, Yuchuan Tian, Lei Yu, Huaao Tang, Jie Hu, Xiangzhong Fang, Hanting Chen Learning to Adapt SAM for Segmenting Cross-Domain Point Clouds
Xidong Peng, Runnan Chen, Feng Qiao, Lingdong Kong, Youquan Liu, Yujing Sun, Tai Wang, Xinge Zhu, Yuexin Ma Learning to Build by Building Your Own Instructions
Aaron T Walsman, Muru Zhang, Adam Fishman, Ali Farhadi, Dieter Fox Learning to Complement and to Defer to Multiple Users
Zheng Zhang, Wenjie Ai, Kevin Wells, David M Rosewarne, Thanh-Toan Do, Gustavo Carneiro Learning to Distinguish Samples for Generalized Category Discovery
Fengxiang Yang, Nan Pu, Wenjing Li, Zhiming Luo, Shaozi Li, Nicu Sebe, Zhun Zhong Learning to Drive via Asymmetric Self-Play
Chris Zhang, Sourav Biswas, Kelvin Wong, Kion Fallah, Lunjun Zhang, Dian Chen, Sergio Casas, Raquel Urtasun Learning to Make Keypoints Sub-Pixel Accurate
Shinjeong Kim, Marc Pollefeys, Daniel Barath Learning Unified Reference Representation for Unsupervised Multi-Class Anomaly Detection
Liren He, Zhengkai Jiang, Jinlong Peng, Wenbing Zhu, Liang Liu, Qiangang Du, Xiaobin Hu, Mingmin Chi, Yabiao Wang, Chengjie Wang Learning Video Context as Interleaved Multimodal Sequences
Kevin Qinghong Lin, Pengchuan Zhang, Difei Gao, Xide Xia, Joya Chen, Ziteng Gao, Jinheng Xie, Xuhong Xiao, Mike Zheng Shou Learning with Counterfactual Explanations for Radiology Report Generation
Mingjie Li, Haokun Lin, Liang Qiu, Xiaodan Liang, Ling Chen, Abdulmotaleb Elsaddik, Xiaojun Chang Learning-Based Axial Video Motion Magnification
Kwon Byung-Ki, Oh Hyun-Bin, Kim Jun-Seong, Hyunwoo Ha, Tae-Hyun Oh LEIA: Latent View-Invariant Embeddings for Implicit 3D Articulation
Archana Swaminathan, Anubhav Gupta, Kamal Gupta, Shishira R Maiya, Vatsal Agarwal, Abhinav Shrivastava Length-Aware Motion Synthesis via Latent Diffusion
Alessio Sampieri, Alessio Palma, Indro Spinelli, Fabio Galasso LEROjD: LiDAR Extended Radar-Only Object Detection
Patrick Palmer, Martin Krüger, Stefan Schütte, Richard Altendorfer, Ganesh Adam, Torsten Bertram Let the Avatar Talk Using Texts Without Paired Training Data
Xiuzhe Wu, Yang-Tian Sun, Handi Chen, Hang Zhou, Jingdong Wang, Zhengzhe Liu, Xiaojuan Qi LetsMap: Unsupervised Representation Learning for Label-Efficient Semantic BEV Mapping
Nikhil Gosala, Kürsat Petek, B Ravi Kiran, Senthil Yogamani, Paulo L. J. Drews-Jr, Wolfram Burgard, Abhinav Valada Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation
Haizhong Zheng, Jiachen Sun, Shutong Wu, Bhavya Kailkhura, Zhuoqing Morley Mao, Chaowei Xiao, Atul Prakash Leveraging Imperfect Restoration for Data Availability Attack
Yi Huang, Jeremy Styborski, Mingzhi Lyu, Fan Wang, Wai-Kin Adams Kong Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos
Akshay Paruchuri, Samuel Ehrenstein, Shuxian Wang, Inbar Fried, Stephen Pizer, Marc Niethammer, Roni Sengupta LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, Ziwei Liu LiDAR-Event Stereo Fusion with Hallucinations
Luca Bartolomei, Matteo Poggi, Andrea Conti, Stefano Mattoccia Light-in-Flight for a World-in-Motion
Jongho Lee, Ryan J Suess, Mohit Gupta LingoQA: Video Question Answering for Autonomous Driving
Ana-Maria Marcu, Long Chen, Jan Hünermann, Alice Karnsund, Benoit Hanotte, Prajwal Chidananda, Saurabh Nair, Vijay Badrinarayanan, Alex Kendall, Jamie Shotton, Elahe Arani, Oleg Sinavski LISO: LiDAR-Only Self-Supervised 3D Object Detection
Stefan Andreas Baur, Frank Moosmann, Andreas Geiger LITA: Language Instructed Temporal-Localization Assistant
De-An Huang, Shijia Liao, Subhashree Radhakrishnan, Hongxu Yin, Pavlo Molchanov, Zhiding Yu, Jan Kautz LiteSAM Is Actually What You Need for Segment Everything
Jianhai Fu, Yuanjie Yu, Ningchuan Li, Yi Zhang, Qichao Chen, Jianping Xiong, Jun Yin, Zhiyu Xiang LivePhoto: Real Image Animation with Text-Guided Motion Control
Xi Chen, Zhiheng Liu, Mengting Chen, Yutong Feng, Yu Liu, Yujun Shen, Hengshuang Zhao LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li LLaVA-UHD: An LMM Perceiving Any Aspect Ratio and High-Resolution Images
Zonghao Guo, Ruyi Xu, Yuan Yao, Junbo Cui, Zanlin Ni, Chunjiang Ge, Tat-Seng Chua, Zhiyuan Liu, Gao Huang LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, Chen Change Loy Loc3Diff: Local Diffusion for 3D Human Head Synthesis and Editing
Yushi Lan, Feitong Tan, Qiangeng Xu, Di Qiu, Kyle Genova, Zeng Huang, Rohit Pandey, Sean Fanello, Thomas Funkhouser, Chen Change Loy, Yinda Zhang Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Runyi Yu, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen Local All-Pair Correspondence for Point Tracking
Seokju Cho, Jiahui Huang, Jisu Nam, Honggyu An, Seungryong Kim, Joon-Young Lee Long-CLIP: Unlocking the Long-Text Capability of CLIP
Beichen Zhang, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Jiaqi Wang Long-Term Temporal Context Gathering for Neural Video Compression
Linfeng Qi, Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, Yan Lu Look Around and Learn: Self-Training Object Detection by Exploration
Gianluca Scarpellini, Stefano Rosa, Pietro Morerio, Lorenzo Natale, Alessio Del Bue Look Hear: Gaze Prediction for Speech-Directed Human Attention
Sounak Mondal, Seoyoung Ahn, Zhibo Yang, Niranjan Balasubramanian, Dimitris Samaras, Gregory Zelinsky, Minh Hoai Lossy Image Compression with Foundation Diffusion Models
Lucas Relic, Roberto Azevedo, Markus Gross, Christopher Schroers Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking
Lorenzo Vaquero, Yihong Xu, Xavier Alameda-Pineda, Victor M. Brea, Manuel Mucientes LPViT: Low-Power Semi-Structured Pruning for Vision Transformers
Kaixin Xu, Zhe Wang, Chunyun Chen, Xue Geng, Jie Lin, Xulei Yang, Min Wu, Xiaoli Li, Weisi Lin M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
Seunggeun Chi, Hyung-gun Chi, Hengbo Ma, Nakul Agarwal, Faizan Siddiqui, Karthik Ramani, Kwonjoon Lee M3DBench: Towards Omni 3D Assistant with Interleaved Multi-Modal Instructions
Mingsheng Li, Xin Chen, Chi Zhang, Sijin Chen, Hongyuan Zhu, Fukun Yin, Zhuoyuan Li, Gang Yu, Tao Chen MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
Haoyu Zhao, Tianyi Lu, Jiaxi Gu, Xing Zhang, Qingping Zheng, Zuxuan Wu, Hang Xu, Yu-Gang Jiang MagicEraser: Erasing Any Objects via Semantics-Aware Control
Fan Li, Zixiao Zhang, Yi Huang, Jianzhuang Liu, Renjing Pei, Bin Shao, Songcen Xu MagicMirror: Fast and High-Quality Avatar Generation with Constrained Search Space
Armand Comas, Di Qiu, Menglei Chai, Marcel C. Bühler, Amit Raj, Ruiqi Gao, Qiangeng Xu, Mark J Matthews, Paulo Gotardo, Sergio Orts-Escolano, Thabo Beeler MagMax: Leveraging Model Merging for Seamless Continual Learning
Daniel Marczak, Bartlomiej Twardowski, Tomasz Trzcinski, Sebastian Cygert MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment
Kanglei Zhou, Liyuan Wang, Xingxing Zhang, Hubert P. H. Shum, Frederick W. B. Li, Jianguo Li, Xiaohui Liang Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia, Xiaodong Cun, Yufei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Bihan Wen Make Your ViT-Based Multi-View 3D Detectors Faster via Token Compression
Dingyuan Zhang, Dingkang Liang, Zichang Tan, Xiaoqing Ye, Cheng Zhang, Jingdong Wang, Xiang Bai Making Large Language Models Better Planners with Reasoning-Decision Alignment
Zhijian Huang, Tao Tang, Shaoxiang Chen, Sihao Lin, Zequn Jie, Lin Ma, Guangrun Wang, Xiaodan Liang MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps
Jianhao Zheng, Daniel Barath, Marc Pollefeys, Iro Armeni MapDistill: Boosting Efficient Camera-Based HD mAP Construction via Camera-LiDAR Fusion Model Distillation
Xiaoshuai Hao, Ruikai Li, Hui Zhang, Rong Yin, Dingzhe Li, Sangil Jung, Seung-In Park, ByungIn Yoo, Haimei Zhao, Jing Zhang Masked Angle-Aware Autoencoder for Remote Sensing Images
Zhihao Li, Biao Hou, Siteng Ma, Zitong Wu, Xianpeng Guo, Bo Ren, Licheng Jiao MathVerse: Does Your Multi-Modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Peng Gao, Hongsheng Li Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta, Jun Chen, Mohamed Elhoseiny, Ruohan Gao, Dinesh Manocha MegaScenes: Scene-Level View Synthesis at Scale
Joseph Tung, Gene Chou, Ruojin Cai, Guandao Yang, Kai Zhang, Gordon Wetzstein, Bharath Hariharan, Noah Snavely Merlin: Empowering Multimodal LLMs with Foresight Minds
En Yu, Liang Zhao, Yana Wei, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation
Yujin Chen, Yinyu Nie, Benjamin Ummenhofer, Reiner Birkl, Michael Paulitsch, Matthias Müller, Matthias Niessner MeshSegmenter: Zero-Shot Mesh Segmentation via Texture Synthesis
Ziming Zhong, Yanyu Xu, Jing Li, Jiale Xu, Zhengxin Li, Chaohui Yu, Shenghua Gao MeshVPR: Citywide Visual Place Recognition Using 3D Meshes
Gabriele Berton, Lorenz Junglas, Riccardo Zaccone, Thomas Pollok, Barbara Caputo, Carlo Masone Meta-Optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
Thong Thanh Nguyen, Yi Bin, Xiaobao Wu, Xinshuai Dong, Zhiyuan Hu, Khoi M Le, Cong-Duy Nguyen, See Kiong Ng, Anh Tuan Luu Meta-Prompting for Automating Zero-Shot Visual Recognition with LLMs
Muhammad Jehanzeb Mirza, Leonid Karlinsky, Wei Lin, Sivan Doveh, Jakub Micorek, Mateusz Kozinski, Hilde Kuehne, Horst Possegger MetaAT: Active Testing for Label-Efficient Evaluation of Dense Recognition Tasks
Sanbao Su, Xin Li, Thang Doan, Sima Behpour, Wenbin He, Liang Gou, Fei Miao, Liu Ren MetaAug: Meta-Data Augmentation for Post-Training Quantization
Cuong Van Pham, Hoang Anh Dung, Cuong Cao Nguyen, Trung Le, Dinh Phung, Gustavo Carneiro, Thanh-Toan Do MetaWeather: Few-Shot Weather-Degraded Image Restoration
Youngrae Kim, Younggeol Cho, Thanh-Tung Nguyen, Seunghoon Hong, Dongman Lee MEVG : Multi-Event Video Generation with Text-to-Video Models
Gyeongrok Oh, Jaehwan Jeong, Sieun Kim, Wonmin Byeon, Jinkyu Kim, Sungwoong Kim, Sangpil Kim MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation
Linyan Yang, Lukas Hoyer, Mark Weber, Tobias Fischer, Dengxin Dai, Laura Leal-Taixé, Daniel Cremers, Marc Pollefeys, Luc Van Gool MinD-3D: Reconstruct High-Quality 3D Objects in Human Brain
Jianxiong Gao, Yuqian Fu, Yun Wang, Xuelin Qian, Jianfeng Feng, Yanwei Fu MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections
Jiayue Liu, Xiao Tang, Freeman Cheng, Zihao Yang, Zhihao Li, Jianzhuang Liu, Yi Huang, Jiaqi Lin, Shiyong Liu, Xiaofei Wu, Songcen Xu, Chun Yuan Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
Brian Gordon, Yonatan Bitton, Yonatan Shafir, Roopal Garg, Xi Chen, Dani Lischinski, Daniel Cohen-Or, Idan Szpektor MM1: Methods, Analysis & Insights from Multimodal LLM Pre-Training
Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Samuel Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Futang Peng, Anton Belyi, Max A Schwarzer, Hongyu Hè, Xianzhi Du, Haotian Zhang, Karanjeet Singh, Doug Kang, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang MMBENCH: Is Your Multi-Modal Model an All-Around Player?
Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin MMEarth: Exploring Multi-Modal Pretext Tasks for Geospatial Representation Learning
Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke, Serge Belongie, Christian Igel, Nico Lang MMVR: Millimeter-Wave Multi-View Radar Dataset and Benchmark for Indoor Perception
Mohammad Mahbubur Rahman, Ryoma Yataka, Sorachi Kato, Pu Wang, Peizhao Li, Adriano Cardace, Petros Boufounos MobileNetV4: Universal Models for the Mobile Ecosystem
Danfeng Qin, Chas H Leichner, Manolis Delakis, Marco Fornoni, Shixin Luo, Fan Yang, Weijun Wang, Colby Banbury, Chengxi Ye, Berkin Akin, Vaibhav Aggarwal, Tenghui Zhu, Daniele Moro, Andrew Howard Möbius Transform for Mitigating Perspective Distortions in Representation Learning
Prakash Chandra Chhipa, Meenakshi Subhash Chippa, Kanjar De, Rajkumar Saini, Marcus Liwicki, Mubarak Shah Modality Translation for Object Detection Adaptation Without Forgetting Prior Knowledge
Heitor Rapela Medeiros, Masih Aminbeidokhti, Fidel A Guerrero Pena, David Latortue, Eric Granger, Marco Pedersoli MoE-DiffIR: Task-Customized Diffusion Priors for Universal Compressed Image Restoration
Yulin Ren, Xin Li, Bingchen Li, Xingrui Wang, Mengxi China Guo, Shijie Zhao, Li Zhang, Zhibo Chen MoEAD: A Parameter-Efficient Model for Multi-Class Anomaly Detection
Shiyuan Meng, Wenchao Meng, Qihang Zhou, Shizhong Li, Weiye Hou, Shibo He MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation
Kunpeng Song, Yizhe Zhu, Bingchen Liu, Qing Yan, Ahmed Elgammal, Xiao Yang Momentum Auxiliary Network for Supervised Local Learning
Junhao Su, Changpeng Cai, Feiyu Zhu, Chenghao He, Xiaojie Xu, Dongzhi Guan, Chenyang Si MONTAGE: Monitoring Training for Attribution of Generative Diffusion Models
Jonathan Brokman, Omer Hofman, Roman Vainshtein, Amit Giloni, Toshiya Shimizu, Inderjeet Singh, Oren Rachmil, Alon Zolfi, Asaf Shabtai, Yuki Unno, Hisashi Kojima Motion and Structure from Event-Based Normal Flow
Zhongyang Ren, Bangyan Liao, Delei Kong, Jinghang Li, Peidong Liu, Laurent Kneip, Guillermo Gallego, Yi Zhou Motion Aware Event Representation-Driven Image Deblurring
Zhijing Sun, Xueyang Fu, Longzhuo Huang, Aiping Liu, Zheng-Jun Zha Motion Mamba: Efficient and Long Sequence Motion Generation
Zeyu Zhang, Akide Liu, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang Motion-Prior Contrast Maximization for Dense Continuous-Time Motion Estimation
Friedhelm Hamann, Ziyun Wang, Ioannis Asmanis, Kenneth Chaney, Guillermo Gallego, Kostas Daniilidis MotionChain: Conversational Motion Controllers via Multimodal Prompts
Biao Jiang, Xin Chen, Chi Zhang, Fukun Yin, Zhuoyuan Li, Gang Yu, Jiayuan Fan MotionDirector: Motion Customization of Text-to-Video Diffusion Models
Rui Zhao, Yuchao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jia-Wei Liu, Weijia Wu, Jussi Keppo, Mike Zheng Shou MoVideo: Motion-Aware Video Generation with Diffusion Models
Jingyun Liang, Yuchen Fan, Kai Zhang, Radu Timofte, Luc Van Gool, Rakesh Ranjan MSD: A Benchmark Dataset for Floor Plan Generation of Building Complexes
Casper van Engelenburg, Fatemeh Mostafavi, Emanuel Kuhn, Yuntae Jeon, Michael Franzen, Matthias Standfest, Jan van Gemert, Seyran Khademi MTaDCS: Moving Trace and Feature Density-Based Confidence Sample Selection Under Label Noise
Qingzheng Huang, Xilin He, Xiaole Xian, Qinliang Lin, Weicheng Xie, Siyang Song, Linlin Shen, Zitong Yu Multi-Branch Collaborative Learning Network for 3D Visual Grounding
Zhipeng Qian, Yiwei Ma, Zhekai Lin, Jiayi Ji, Xiawu Zheng, Xiaoshuai Sun, Rongrong Ji Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
Fabien Baradel, Thomas Lucas, Matthieu Armando, Salma Galaaoui, Romain Brégier, Philippe Weinzaepfel, Gregory Rogez Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification
Jiangming Shi, Xiangbo Yin, Yeyun Chen, Yachao Zhang, Zhizhong Zhang, Yuan Xie, Yanyun Qu Multi-Modal Crowd Counting via a Broker Modality
Haoliang Meng, Xiaopeng Hong, Chenhao Wang, Miao Shang, Wangmeng Zuo Multi-Modal Relation Distillation for Unified 3D Representation Learning
Huiqun Wang, Yiping Bao, Panwang Pan, Zeming Li, Xiao Liu, Ruijie Yang, Di Huang Multi-RoI Human Mesh Recovery with Camera Consistency and Contrastive Losses
Yongwei Nie, Changzhen Liu, Chengjiang Long, Qing Zhang, Guiqing Li, Hongmin Cai Multi-Sentence Grounding for Long-Term Instructional Video
Zeqian Li, Qirui Chen, Tengda Han, Ya Zhang, Yan-Feng Wang, Weidi Xie Multi-Task Domain Adaptation for Language Grounding with 3D Objects
Penglei Sun, Yaoxian Song, Xinglin Pan, Peijie Dong, Xiaofei Yang, Qiang Wang, Zhixu Li, Tiefeng Li, Xiaowen Chu Multimodal Label Relevance Ranking via Reinforcement Learning
Taian Guo, Taolin Zhang, Haoqian Wu, Hanjun Li, Ruizhi Qiao, Xing Sun Multiscale Graph Texture Network
Ravishankar Evani, Deepu Rajan, Shangbo Mao Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures
Jiaqi He, Zhihua Wang, Leon Wang, Tsein-I Liu, Yuming Fang, Qilin Sun, Kede Ma Multistain Pretraining for Slide Representation Learning in Pathology
Guillaume Jaume, Anurag J Vaidya, Andrew Zhang, Andrew Song, Richard J Chen, Sharifa Sahai, Dandan Mo, Emilio Madrigal, Long P Le, Faisal Mahmood MUSES: The Multi-Sensor Semantic Perception Dataset for Driving Under Uncertainty
Tim Broedermann, David Brüggemann, Christos Sakaridis, Kevin Ta, Odysseas Liagouris, Jason Corkill, Luc Van Gool MVDD: Multi-View Depth Diffusion Models
Zhen Wang, Qiangeng Xu, Feitong Tan, Menglei Chai, Shichen Liu, Rohit Pandey, Sean Fanello, Achuta Kadambi, Yinda Zhang MVDiffHD: A Dense High-Resolution Multi-View Diffusion Model for Single or Sparse-View 3D Object Reconstruction
Shitao Tang, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Furukawa, Rakesh Ranjan MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo
Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai MyVLM: Personalizing VLMs for User-Specific Queries
Yuval Alaluf, Elad Richardson, Sergey Tulyakov, Kfir Aberman, Danny Cohen-Or N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields
Yash Bhalgat, Iro Laina, Joao F Henriques, Andrew Zisserman, Andrea Vedaldi NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition
Chenyu Liu, Jia Pan, Jinshui Hu, Baocai Yin, Bing Yin, Mingjun Chen, Cong Liu, Jun Du, Qingfeng Liu Navigating Text-to-Image Generative Bias Across Indic Languages
Surbhi Mittal, Arnav Sudan, Mayank Vatsa, Richa Singh, Tamar Glaser, Tal Hassner NeRF-XL: NeRF at Any Scale with Multi-GPU
Ruilong Li, Sanja Fidler, Angjoo Kanazawa, Francis Williams Neural Graphics Texture Compression Supporting Random Access
Farzad Farhadzadeh, Qiqi Hou, Hoang Le, Amir Said, Randall R Rauwendaal, Alex Bourd, Fatih Porikli Neural Metamorphosis
Xingyi Yang, Xinchao Wang Neural Spectral Decomposition for Dataset Distillation
Shaolei Yang, Shen Cheng, Mingbo Hong, Haoqiang Fan, Xing Wei, Shuaicheng Liu Neural Surface Detection for Unsigned Distance Fields
Federico Stella, Nicolas Talabot, Hieu Le, Pascal Fua NeuroNCAP: Photorealistic Closed-Loop Safety Testing for Autonomous Driving
William Ljungbergh, Adam Tonderski, Joakim Johnander, Holger Caesar, Kalle Åström, Michael Felsberg, Christoffer Petersson NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation
Ruikai Cui, Weizhe Liu, Weixuan Sun, Senbo Wang, Taizhang Shang, Yang Li, Xibin Song, Han Yan, Zhennan Wu, Shenzhou Chen, Hongdong Li, Pan Ji Noise-Assisted Prompt Learning for Image Forgery Detection and Localization
Dong Li, Jiaying Zhu, Xueyang Fu, Xun Guo, Yidi Liu, Gang Yang, Jiawei Liu, Zheng-Jun Zha Non-Parametric Sensor Noise Modeling and Synthesis
Ali Mosleh, Luxi Zhao, Atin Vikram Singh, Jaeduk Han, Abhijith Punnappurath, Marcus A Brubaker, Jihwan Choe, Michael S Brown Non-Transferable Pruning
Ruyi Ding, Lili Su, A. Adam Ding, Yunsi Fei Nonverbal Interaction Detection
Jianan Wei, Tianfei Zhou, Yi Yang, Wenguan Wang Norface: Improving Facial Expression Analysis by Identity Normalization
Hanwei Liu, Rudong An, Zhimeng Zhang, Bowen Ma, Wei Zhang, Yan Song, Yujing Hu, Chen Wei, Yu Ding Norma: A Noise Robust Memory-Augmented Framework for Whole Slide Image Classification
Yu Bai, Bo Zhang, Zheng Zhang, Shuo Yan, Zibo Ma, Wu Liu, Xiuzhuang Zhou, Xiangyang Gong, Wendong Wang NOVUM: Neural Object Volumes for Robust Object Classification
Artur Jesslen, Guofeng Zhang, Angtian Wang, Wufei Ma, Alan Yuille, Adam Kortylewski Nuvo: Neural UV Mapping for Unruly 3D Representations
Pratul Srinivasan, Stephan J Garbin, Dor Verbin, Jonathan T Barron, Ben Mildenhall Nymeria: A Massive Collection of Egocentric Multi-Modal Human Motion in the Wild
Lingni Ma, Yuting Ye, Rowan Postyeni, Alexander J Gamino, Vijay Baiyya, Luis Pesqueira, Kevin M Bailey, David Soriano Fosas, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Hyo Jin Kim, Jakob Engel, Karen Liu, Ziwei Liu, Renzo De Nardi, Richard Newcombe O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation
Muer Tie, Julong Wei, Zhengjun Wang, Ke Wu, Shanshuai Yuan, Kaizhao Zhang, Jie Jia, Jieru Zhao, Zhongxue Gan, Wenchao Ding OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal
Qiao Mo, Yukang Ding, Jinhua Hao, Qiang Zhu, Ming Sun, Chao Zhou, Feiyu Chen, Shuyuan Zhu OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction
Yini Fang, Jingling Yu, Haozheng Zhang, Ralf van der Lans, Bertram E Shi Object-Aware NIR-to-Visible Translation
Yunyi Gao, Lin Gu, Qiankun Liu, Ying Fu Object-Centric Diffusion for Efficient Video Editing
Kumara Kahatapitiya, Adil Karjauv, Davide Abati, Fatih Porikli, Yuki M Asano, Amirhossein Habibian Object-Oriented Anchoring and Modal Alignment in Multimodal Learning
Shibin Mei, Bingbing Ni, Hang Wang, Chenglong Zhao, Fengfa Hu, Zhiming Pi, BiLian Ke OccGen: Generative Multi-Modal 3D Occupancy Prediction for Autonomous Driving
Guoqing Wang, Zhongdao Wang, Pin Tang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma Occluded Gait Recognition with Mixture of Experts: An Action Detection Perspective
Panjian Huang, Yunjie Peng, Saihui Hou, Chunshui Cao, Xu Liu, Zhiqiang He, Yongzhen Huang Occlusion-Aware Seamless Segmentation
Yihong Cao, Jiaming Zhang, Hao Shi, Kunyu Peng, Yuhongxuan Zhang, Hui Zhang, Rainer Stiefelhagen, Kailun Yang Occupancy as Set of Points
Yiang Shi, Tianheng Cheng, Qian Zhang, Wenyu Liu, Xinggang Wang OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
Wenzhao Zheng, Weiliang Chen, Yuanhui Huang, Borui Zhang, Yueqi Duan, Jiwen Lu Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Jingkang Yang, Yuhao Dong, Shuai Liu, Bo Li, Ziyue Wang, ChenCheng Jiang, Haoran Tan, Jiamu Kang, Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu OMG: Occlusion-Friendly Personalized Multi-Concept Generation in Diffusion Models
Zhe Kong, Yong Zhang, Tianyu Yang, Tao Wang, Kaihao Zhang, Bizhu Wu, Guanying Chen, Wei Liu, Wenhan Luo Omni6DPose: A Benchmark and Model for Universal 6d Object Pose Estimation and Tracking
Jiyao Zhang, Weiyao Huang, Bo Peng, Mingdong Wu, Fei Hu, Zijian Chen, Bo Zhao, Hao Dong OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Raghav Kapoor, Yash Parag Butala, Melisa A Russak, Jing Yu Koh, Kiran Kamble, Waseem AlShikh, Ruslan Salakhutdinov OmniNOCS: A Unified NOCS Dataset and Model for 3D Lifting of 2D Objects
Akshay Krishnan, Abhijit Kundu, Kevis-Kokitsi Maninis, James Hays, Matthew Brown OmniSat: Self-Supervised Modality Fusion for Earth Observation
Guillaume Astruc, Nicolas Gonthier, Clement Mallet, Loic Landrieu On Pretraining Data Diversity for Self-Supervised Learning
Hasan Abed Al Kader Hammoud, Tuhin Das, Fabio Pizzati, Philip Torr, Adel Bibi, Bernard Ghanem On the Evaluation Consistency of Attribution-Based Explanations
Jiarui Duan, Haoling Li, Haofei Zhang, Hao Jiang, Mengqi Xue, Li Sun, Mingli Song, Jie Song On the Utility of 3D Hand Poses for Action Recognition
Md Salman Shamil, Dibyadip Chatterjee, Fadime Sener, Shugao Ma, Angela Yao On the Viability of Monocular Depth Pre-Training for Semantic Segmentation
Dong Lao, Fengyu Yang, Daniel Wang, Hyoungseob Park, Samuel Lu, Alex Wong, Stefano Soatto One-Shot Diffusion Mimicker for Handwritten Text Generation
Gang Dai, Yifan Zhang, Quhui Ke, Qiangya Guo, Shuangping Huang One-Stage Prompt-Based Continual Learning
Youngeun Kim, Yuhang Li, Priyadarshini Panda OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework
Wanyun Li, Pinxue Guo, Xinyu Zhou, Lingyi Hong, Yangji He, Xiangyu Zheng, Wei Zhang, Wenqiang Zhang Online Continuous Generalized Category Discovery
Keon-Hee Park, Hakyung Lee, Kyungwoo Song, Gyeong-Moon Park Online Vectorized HD mAP Construction Using Geometry
Zhixin Zhang, Yiyuan Zhang, Xiaohan Ding, Fusheng Jin, Xiangyu Yue Open Panoramic Segmentation
Junwei Zheng, Ruiping Liu, Yufan Chen, Kunyu Peng, Chengzhi Wu, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen Open Vocabulary Multi-Label Video Classification
Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan, Ashish Tawari, Son Tran, Mubarak Shah, Benjamin Yao, Trishul A Chilimbi Open-Set Biometrics: Beyond Good Closed-Set Models
Yiyang Su, Minchul Kim, Feng Liu, Anil Jain, Xiaoming Liu Open-Set Recognition in the Age of Vision-Language Models
Dimity Miller, Niko Suenderhauf, Alex Kenna, Keita Mason Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
Xiaoyu Zhu, Hao Zhou, Pengfei Xing, Long Zhao, Hao Xu, Junwei Liang, Alexander G. Hauptmann, Ting Liu, Andrew Gallagher Open-Vocabulary Camouflaged Object Segmentation
Youwei Pang, Xiaoqi Zhao, JiaMing Zuo, Lihe Zhang, Huchuan Lu Open-Vocabulary RGB-Thermal Semantic Segmentation
GuoQiang Zhao, JunJie Huang, Xiaoyun Yan, Zhaojing Wang, Junwei Tang, Yangjun Ou, Xinrong Hu, Tao Peng Open-World Dynamic Prompt and Continual Visual Representation Learning
Youngeun Kim, Jun Fang, Qin Zhang, Zhaowei Cai, Yantao Shen, Rahul Duggal, Dripta S. Raychaudhuri, Zhuowen Tu, Yifan Xing, Onkar Dabeer OPEN: Object-Wise Position Embedding for Multi-View 3D Object Detection
Jinghua Hou, Tong Wang, Xiaoqing Ye, Zhe Liu, Shi Gong, Xiao Tan, Errui Ding, Jingdong Wang, Xiang Bai OpenIns3D: Snap and Lookup for 3D Open-Vocabulary Instance Segmentation
Zhening Huang, Xiaoyang Wu, Xi Chen, Hengshuang Zhao, Lei Zhu, Joan Lasenby OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
Hu Zhang, Xu Jianhua, Tao Tang, Haiyang Sun, Xin Yu, Zi Helen Huang, Kaicheng Yu Operational Open-Set Recognition and PostMax Refinement
Steve Cruz, Ryan Rabinowitz, Manuel Günther, Terrance E. Boult OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
Ming Hu, Peng Xia, Lin Wang, Siyuan Yan, Feilong Tang, Zhongxing Xu, Yimin Luo, Kaimin Song, Jurgen Leitner, Xuelian Cheng, Jun Cheng, Chi Liu, Kaijing Zhou, Zongyuan Ge Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation
Yixiao Wang, Chen Tang, Lingfeng Sun, Simone Rossi, Yichen Xie, Chensheng Peng, Thomas Hannagan, Stefano Sabatini, Nicola Poerio, Masayoshi Tomizuka, Wei Zhan PACE: Pose Annotations in Cluttered Environments
Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam Harley, Leonidas Guibas, Cewu Lu PALM: Predicting Actions Through Language Models
Sanghwan Kim, Daoji Huang, Yongqin Xian, Otmar Hilliges, Luc Van Gool, Xi Wang PanoVOS: Bridging Non-Panoramic and Panoramic Views with Transformer for Video Segmentation
Shilin Yan, Xiaohao Xu, Renrui Zhang, Lingyi Hong, Wenchao Chen, Wenqiang Zhang, Wei Zhang PapMOT: Exploring Adversarial Patch Attack Against Multiple Object Tracking
Jiahuan Long, Tingsong Jiang, Wen Yao, Shuai Jia, Weijia Zhang, Weien Zhou, Chao Ma, Xiaoqian Chen ParCo: Part-Coordinating Text-to-Motion Synthesis
Qiran Zou, Shangyuan Yuan, Shian Du, Yu Wang, Chang Liu, Yi Xu, Jie Chen, Xiangyang Ji Parrot Captions Teach CLIP to Spot Text
Yiqi Lin, Conghui He, Alex Jinpeng Wang, Bin Wang, Weijia Li, Mike Zheng Shou Parrot: Pareto-Optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
Seung Hyun Lee, Yinxiao Li, Junjie Ke, Innfarn Yoo, Han Zhang, Jiahui Yu, Qifei Wang, Fei Deng, Glenn Entis, Junfeng He, Gang Li, Sangpil Kim, Irfan Essa, Feng Yang Part2Object: Hierarchical Unsupervised 3D Instance Segmentation
Cheng Shi, Yulin Zhang, Bin Yang, Jiajin Tang, Yuexin Ma, Sibei Yang PartCraft: Crafting Creative Objects by Parts
Kam Woh Ng, Xiatian Zhu, Yi-Zhe Song, Tao Xiang PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
Yuxuan Sun, Hao Wu, Chenglu Zhu, Sunyi Zheng, Qizi Chen, Kai Zhang, Yunlong Zhang, Dan Wan, Xiaoxiao Lan, Mengyue Zheng, Jingxiong Li, Xinheng Lyu, Tao Lin, Lin Yang PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion
Runsong Zhu, Shi Qiu, Qianyi Wu, Ka-Hei Hui, Pheng-Ann Heng, Chi-Wing Fu Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers’ Opinion Scores
Lucas Goncalves, Prashant Mathur, Chandrashekhar Lavania, Metehan Cekic, Marcello Federico, Kyu Han PFedEdit: Personalized Federated Learning via Automated Model Editing
Haolin Yuan, William Paul, John Aucott, Philippe Burlina, Yinzhi Cao Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering
Ruofan Liang, Zan Gojcic, Merlin Nimier-David, David Acuna, Nandita Vijaykumar, Sanja Fidler, Zian Wang Photorealistic Video Generation with Diffusion Models
Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, Jose Lezama PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations
Yang Zheng, Qingqing Zhao, Guandao Yang, Wang Yifan, Donglai Xiang, Florian Dubost, Dmitry Lagun, Thabo Beeler, Federico Tombari, Leonidas Guibas, Gordon Wetzstein Physical-Based Event Camera Simulator
Haiqian Han, Jiacheng Lyu, Jianing Li, Henglu Wei, Cheng Li, Yajing Wei, Shu Chen, Xiangyang Ji Physics-Based Interaction with 3D Objects via Video Generation
Tianyuan Zhang, Hong-Xing Yu, Rundi Wu, Brandon Y Feng, Changxi Zheng, Noah Snavely, Jiajun Wu, William T. Freeman PiTe: Pixel-Temporal Alignment for Large Video-Language Model
Yang Liu, Pengxiang Ding, Siteng Huang, Min Zhang, Han Zhao, Donglin Wang PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4k Text-to-Image Generation
Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Li Placing Objects in Context via Inpainting for Out-of-Distribution Segmentation
Pau de Jorge Aranda, Riccardo Volpi, Puneet Dokania, Philip Torr, Gregory Rogez Plan, Posture and Go: Towards Open-Vocabulary Text-to-Motion Generation
Jinpeng Liu, Wenxun Dai, Chunyu Wang, Yiji Cheng, Yansong Tang, Xin Tong Platypus: A Generalized Specialist Model for Reading Text in Various Forms
Peng Wang, Zhaohai Li, Jun Tang, Humen Zhong, Fei Huang, Zhibo Yang, Cong Yao POA: Pre-Training Once for Models of All Sizes
Yingying Zhang, Xin Guo, Jiangwei Lao, Lei Yu, Lixiang Ru, Jian Wang, Guo Ye, Huimei He, Jingdong Chen, Ming Yang POCA: Post-Training Quantization with Temporal Alignment for Codec Avatars
Jian Meng, Yuecheng Li, Leo Li, Syed Shakib Sarwar, Dilin Wang, Jae-sun Seo POET: Prompt Offset Tuning for Continual Human Action Adaptation
Prachi Garg, K J Joseph, Vineeth N Balasubramanian, Necati Cihan Camgoz, Chengde Wan, Kenrick Kin, Weiguang Si, Shugao Ma, Fernando de la Torre PointLLM: Empowering Large Language Models to Understand Point Clouds
Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua Lin PointNeRF++: A Multi-Scale, Point-Based Neural Radiance Field
Weiwei Sun, Eduard Trulls, Yang-Che Tseng, Sneha Sambandam, Gopal Sharma, Andrea Tagliasacchi, Kwang Moo Yi PolyOculus: Simultaneous Multi-View Image-Based Novel View Synthesis
Jason J. Yu, Tristan Aumentado-Armstrong, Fereshteh Forghani, Konstantinos G. Derpanis, Marcus A. Brubaker PolyRoom: Room-Aware Transformer for Floorplan Reconstruction
Yuzhou Liu, Lingjie Zhu, Xiaodong Ma, Hanqiao Ye, Xiang Gao, Xianwei Zheng, Shuhan Shen Pose Guided Fine-Grained Sign Language Video Generation
Tongkai Shi, Lianyu Hu, Fanhua Shang, Jichao Feng, Liu Peidong, Wei Feng PQ-SAM: Post-Training Quantization for Segment Anything Model
Xiaoyu Liu, Xin Ding, Lei Yu, Yuanyuan Xi, Wei Li, Zhijun Tu, Jie Hu, Hanting Chen, Baoqun Yin, Zhiwei Xiong PreciseControl: Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control
Rishubh Parihar, Sachidanand Vs, Sabariswaran Mani, Tejan Karmali, Venkatesh Babu Radhakrishnan Privacy-Preserving Adaptive Re-Identification Without Image Transfer
Hamza Rami, Jhony H. Giraldo, Nicolas Winckler, Stéphane Lathuilière Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang, Fu-Jen Chu, Kris Kitani, Gedas Bertasius, Xitong Yang ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
Mengcheng Lan, Chaofeng Chen, Yiping Ke, Xinjiang Wang, Litong Feng, Wayne Zhang PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
Yizhe Xiong, Hui Chen, Tianxiang Hao, Zijia Lin, Jungong Han, Yuesong Zhang, Guoxin Wang, Yongjun Bao, Guiguang Ding Pyramid Diffusion for Fine 3D Large Scene Generation
Yuheng Liu, Xinke Li, Xueting Li, Lu Qi, Chongshou Li, Ming-Hsuan Yang Quality Assured: Rethinking Annotation Strategies in Imaging AI
Tim Rädsch, Annika Reinke, Vivienn Weru, Minu D. Tizabi, Nicholas Heller, Fabian Isensee, Annette Kopp-Schneider, Lena Maier-Hein Quanta Video Restoration
Prateek Chennuri, Yiheng Chi, Enze Jiang, GM Dilshan Godaliyadda, Abhiram Gnanasambandam, Hamid R Sheikh, Istvan Gyongy, Stanley H Chan Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao, Xiaohan Ding, Juexiao Feng, Yuhong Yang, Hui Chen, Guiguang Ding QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Pengxiang Ding, Han Zhao, Wenjie Zhang, Wenxuan Song, Min Zhang, Siteng Huang, Ningxi Yang, Donglin Wang R^2-Bench: Benchmarking the Robustness of Referring Perception Models Under Perturbations
Xiang Li, Kai Qiu, Jinglu Wang, Xiaohao Xu, Kashu Yamazaki, Hao Chen, Rita Singh, Xiaonan Huang, Bhiksha Raj R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Ye Liu, Jixuan He, Wanhua Li, Junsik Kim, Donglai Wei, Hanspeter Pfister, Chang Wen Chen R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection
Zheyuan Zhou, Le Wang, Naiyu Fang, Zili Wang, Lemiao Qiu, Shuyou Zhang R3DS: Reality-Linked 3D Scenes for Panoramic Scene Understanding
Qirui Wu, Sonia Raychaudhuri, Daniel Ritchie, Manolis Savva, Angel X Chang RadEdit: Stress-Testing Biomedical Vision Models via Diffusion Image Editing
Fernando Pérez-García, Sam Bond-Taylor, Pedro Sanchez, Boris van Breugel, Daniel Coelho de Castro, Harshita Sharma, Valentina Salvatelli, Maria Teodora A Wetscherek, Hannah CM Richardson, Lungren Matthew, Aditya Nori, Javier Alvarez-Valle, Ozan Oktay, Maximilian Ilse Radiative Gaussian Splatting for Efficient X-Ray Novel View Synthesis
Yuanhao Cai, Yixun Liang, Jiahao Wang, Angtian Wang, Yulun Zhang, Xiaokang Yang, Zongwei Zhou, Alan Yuille RaFE: Generative Radiance Fields Restoration
Zhongkai Wu, Ziyu Wan, Jing Zhang, Jing Liao, Dong Xu RANRAC: Robust Neural Scene Representations via Random Ray Consensus
Benno Buschmann, Andreea Dogaru, Elmar Eisemann, Michael Weinmann, Bernhard Egger Rasterized Edge Gradients: Handling Discontinuities Differentially
Stanislav Pidhorskyi, Tomas Simon, Gabriel Schwartz, He Wen, Yaser Sheikh, Jason Saragih Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
Jinming Liu, Ruoyu Feng, Yunpeng Qi, Qiuyu Chen, Zhibo Chen, Wenjun Zeng, Xin Jin Ray Denoising: Depth-Aware Hard Negative Sampling for Multi-View 3D Object Detection
Feng Liu, Tengteng Huang, Qianjing Zhang, Haotian Yao, Chi Zhang, Fang Wan, Qixiang Ye, Yanzhao Zhou Real Appearance Modeling for More General Deepfake Detection
Jiahe Tian, Cai Yu, Xi Wang, Peng Chen, Zihao Xiao, Jiao Dai, Yesheng Chai, Jizhong Han Real-Data-Driven 2000 FPS Color Video from Mosaicked Chromatic Spikes
Siqi Yang, Zhaojun Huang, Yakun Chang, Bin Fan, Zhaofei Yu, Boxin Shi Real-Time 3D-Aware Portrait Editing from a Single Image
Qingyan Bai, Zifan Shi, Yinghao Xu, Hao Ouyang, Qiuyu Wang, Ceyuan Yang, Xuan Wang, Gordon Wetzstein, Yujun Shen, Qifeng Chen Real-Time Holistic Robot Pose Estimation with Unknown States
Shikun Ban, Juling Fan, Xiaoxuan Ma, Wentao Zhu, Yu Qiao, Yizhou Wang Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers
Chi-Pin Huang, Kai-Po Chang, Chung-Ting Tsai, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang ReCON: Training-Free Acceleration for Text-to-Image Synthesis with Retrieval of Concept Prompt Trajectories
Chen-Yi Lu, Shubham Agarwal, Md Mehrab Tanjim, Kanak Mahadik, Anup Rao, Subrata Mitra, Shiv K Saini, Saurabh Bagchi, Somali Chaterji Recursive Visual Programming
Jiaxin Ge, Sanjay Subramanian, Baifeng Shi, Roei Herzig, Trevor Darrell Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
Yaoting Wang, Peiwen Sun, Dongzhan Zhou, Guangyao Li, Honggang Zhang, Di Hu Referring Atomic Video Action Recognition
Kunyu Peng, Jia Fu, Kailun Yang, Di Wen, Yufan Chen, Ruiping Liu, Junwei Zheng, Jiaming Zhang, Saquib Sarfraz, Rainer Stiefelhagen, Alina Roitberg Region-Native Visual Tokenization
Mengyu Wang, Yuyao Huang, Henghui Ding, Xinlong Wang, Tiejun Huang, Yao Zhao, Yunchao Wei, Shuicheng Yan Regularizing Dynamic Radiance Fields with Kinematic Fields
Woobin Im, Geonho Cha, Sebin Lee, Jumin Lee, Juhyeong Seon, Dongyoon Wee, Sungeui Yoon Reinforcement Learning Meets Visual Odometry
Nico Messikommer, Giovanni Cioffi, Mathias Gehrig, Davide Scaramuzza Reinforcement Learning via Auxillary Task Distillation
Abhinav N Harish, Larry Heck, Josiah P Hanna, Zsolt Kira, Andrew Szot Reliability in Semantic Segmentation: Can We Use Synthetic Data?
Thibaut Loiseau, Tuan-Hung Vu, Mickael Chen, Patrick Pérez, Matthieu Cord Reliable Spatial-Temporal Voxels for Multi-Modal Test-Time Adaptation
Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Xingyu Ji, Shenghai Yuan, Lihua Xie Relightable Neural Actor with Intrinsic Decomposition and Pose Control
Diogo Carbonera Luvizon, Vladislav Golyanik, Adam Kortylewski, Marc Habermann, Christian Theobalt ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild
Chen Guo, Tianjian Jiang, Manuel Kaufmann, Chengwei Zheng, Julien Valentin, Jie Song, Otmar Hilliges ReMamber: Referring Image Segmentation with Mamba Twister
Yuhuan Yang, Chaofan Ma, Jiangchao Yao, Zhun Zhong, Ya Zhang, Yanfeng Wang ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions
Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik, Christian Theobalt, Philipp Slusallek Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
Mu Cai, Haotian Liu, Yuheng Li, Yijun Li, Eli Shechtman, Zhe Lin, Yong Jae Lee, Krishna Kumar Singh ReNoise: Real Image Inversion Through Iterative Noising
Daniel Garibi, Or Patashnik, Andrey Voynov, Hadar Averbuch-Elor, Danny Cohen-Or Repaint123: Fast and High-Quality One Image to 3D Generation with Progressive Controllable Repainting
Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Xing Zhou, Munan Ning, Li Yuan RePOSE: 3D Human Pose Estimation via Spatio-Temporal Depth Relational Consistency
Ziming Sun, Yuan Liang, Zejun Ma, Tianle Zhang, Linchao Bao, Guiqing Li, Shengfeng He Reprojection Errors as Prompts for Efficient Scene Coordinate Regression
Ting-Ru Liu, Hsuan-Kung Yang, Jou-Min Liu, Chun-Wei Huang, Tsung-Chih Chiang, Quan Kong, Norimasa Kobori, Chun-Yi Lee Resilience of Entropy Model in Distributed Neural Networks
Milin Zhang, Mohammad Abdi, Shahriar Rifat, Francesco Restuccia Responsible Visual Editing
Minheng Ni, Yeli Shen, Lei Zhang, Wangmeng Zuo ReSyncer: Rewiring Style-Based Generator for Unified Audio-Visually Synced Facial Performer
Jiazhi Guan, Zhiliang Xu, Hang Zhou, Kaisiyuan Wang, Shengyi He, Zhanwang Zhang, Borong Liang, Haocheng Feng, Errui Ding, Jingtuo Liu, Jingdong Wang, Youjian Zhao, Ziwei Liu Retargeting Visual Data with Deformation Fields
Tim Elsner, Julia Berger, Tong Wu, Victor Czech, Lin Gao, Leif Kobbelt Rethinking Image Super Resolution from Training Data Perspectives
Go Ohtani, Ryu Tadokoro, Ryosuke Yamada, Yuki M Asano, Iro Laina, Christian Rupprecht, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka, Yoshimitsu Aoki Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model
Chen Rao, Guangyuan Li, Zehua Lan, Jiakai Sun, Junsheng Luan, Wei Xing, Lei Zhao, Huaizhong Lin, Jianfeng Dong, Dalong Zhang Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data
Wufei Ma, Kai Li, Zhongshi Jiang, Moustafa Meshry, Qihao Liu, Huiyu Wang, Christian Haene, Alan Yuille Rethinking Weakly-Supervised Video Temporal Grounding from a Game Perspective
Xiang Fang, Zeyu Xiong, Wanlong Fang, Xiaoye Qu, Chen Chen, Jianfeng Dong, Keke Tang, Pan Zhou, Yu Cheng, Daizong Liu Retrieval Robust to Object Motion Blur
Rong Zou, Marc Pollefeys, Denys Rozumnyi Revising Densification in Gaussian Splatting
Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder Revisit Anything: Visual Place Recognition via Image Segment Retrieval
Kartik Garg, Sai Shubodh, Shishir N Y Kolathaya, Madhava Krishna, Sourav Garg Revisit Human-Scene Interaction via Space Occupancy
Xinpeng Liu, Haowen Hou, Yanchao Yang, Yong-Lu Li, Cewu Lu Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View
Jianan Fan, Dongnan Liu, Canran Li, Hang Chang, Heng Huang, Filip Braet, Mei Chen, Weidong Cai Revisiting Calibration of Wide-Angle Radially Symmetric Cameras
Andrea Porfiri Dal Cin, Francesco Azzoni, Giacomo Boracchi, Luca Magri Revisiting Supervision for Continual Representation Learning
Daniel Marczak, Sebastian Cygert, Tomasz Trzcinski, Bartlomiej Twardowski Rgbd Gs-Icp SLAM
Seongbo Ha, Jiung Yeon, Hyeonwoo Yu RICA^2: Rubric-Informed, Calibrated Assessment of Actions
Abrar Majeedi, Viswanatha Reddy Gajjala, Satya Sai Srinath Namburi Gnvv, Yin Li RING-NeRF : Rethinking Inductive Biases for Versatile and Efficient Neural Fields
Doriand Petit, Steve Bourgeois, Dumitru Pavel, Vincent Gay-Bellile, Florian Chabot, Loïc Barthe Robust Calibration of Large Vision-Language Adapters
Balamurali Murugesan, Julio Silva-Rodríguez, Ismail Ben Ayed, Jose Dolz Robust Fitting on a Gate Quantum Computer
Frances F Yang, Michele Sasdelli, Tat-Jun Chin Robust Incremental Structure-from-Motion with Hybrid Features
Shaohui Liu, Yidan Gao, Tianyi Zhang, Rémi Pautrat, Johannes L Schönberger, Viktor Larsson, Marc Pollefeys Robustness Preserving Fine-Tuning Using Neuron Importance
Guangrui Li, Rahul Duggal, Aaditya Singh, Kaustav Kundu, Bing Shuai, Jonathan Wu RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
Bowen Zhang, Yiji Cheng, Chunyu Wang, Ting Zhang, Jiaolong Yang, Yansong Tang, Feng Zhao, Dong Chen, Baining Guo RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes
Thang-Anh-Quan Nguyen, Luis G Roldao Jimenez, Nathan Piasco, Moussab Bennehar, Dzmitry Tsishkou RoGUENeRF: A Robust Geometry-Consistent Universal Enhancer for NeRF
Sibi Catley-Chandar, Richard Shaw, Gregory Slabaugh, Eduardo Pérez Pellitero RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting
Qi Wang, Ruijie Lu, Xudong Xu, Jingbo Wang, Michael Yu Wang, Bo Dai, Gang Zeng, Dan Xu RoScenes: A Large-Scale Multi-View 3D Dataset for Roadside Perception
Xiaosu Zhu, Hualian Sheng, Sijia Cai, Bing Deng, Shaopeng Yang, Qiao Liang, Ken Chen, Lianli Gao, Jingkuan Song, Jieping Ye Rotary Position Embedding for Vision Transformer
Byeongho Heo, Song Park, Dongyoon Han, Sangdoo Yun RPBG: Towards Robust Neural Point-Based Graphics in the Wild
Qingtian Zhu, Zizhuang Wei, Zhongtian Zheng, Yifan Zhan, Zhuyu Yao, Jiawang Zhang, Kejian Wu, Yinqiang Zheng RS-NeRF: Neural Radiance Fields from Rolling Shutter Images
Muyao Niu, Tong Chen, Yifan Zhan, Zhuoxiao Li, Xiang Ji, Yinqiang Zheng RSL-BA: Rolling Shutter Line Bundle Adjustment
Yongcong Zhang, Bangyan Liao, Yifei Xue, Lu Chen, Peidong Liu, Yizhen Lao RT-Pose: A 4D Radar-Tensor Based 3D Human Pose Estimation and Localization Benchmark
Yuan-Hao Ho, Jen-Hao Cheng, Sheng Yao Kuan, Zhongyu Jiang, Wenhao Chai, Hsiang-Wei Huang, Chih-Lung Lin, Jenq-Neng Hwang Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
Samuele Poppi, Tobia Poppi, Federico Cocchi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
Sanghyun Kim, Seohyeon Jung, Balhae Kim, Moonseok Choi, Jinwoo Shin, Juho Lee SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
Bac Nguyen, Stefan Uhlich, Fabien Cardinaux, Lukas Mauch, Marzieh Edraki, Aaron Courville SAGS: Structure-Aware 3D Gaussian Splatting
Evangelos Ververas, Rolandos Alexandros Potamias, Jifei Song, Jiankang Deng, Stefanos Zafeiriou SAM-Guided Graph Cut for 3D Instance Segmentation
Haoyu Guo, He Zhu, Sida Peng, Yuang Wang, Yujun Shen, Ruizhen Hu, Xiaowei Zhou Sapiens: Foundation for Human Vision Models
Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez, Zhaoen Su, Austin T James, Peter Selednik, Stuart Anderson, Shunsuke Saito SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
Zijie Wu, Chaohui Yu, Yanqin Jiang, Chenjie Cao, Fan Wang, Xiang Bai Scalable Group Choreography via Variational Phase Manifold Learning
Nhat Le, Khoa Do, Xuan Bui, Tuong Do, Erman Tjiputra, Quang D.Tran, Anh Nguyen Scalar Function Topology Divergence: Comparing Topology of 3D Objects
Ilya Trofimov, Daria Voronkova, Eduard Tulchinskii, Evgeny Burnaev, Serguei Barannikov Scaling Backwards: Minimal Synthetic Pre-Training?
Ryo Nakamura, Ryu Tadokoro, Ryosuke Yamada, Yuki M Asano, Iro Laina, Christian Rupprecht, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka ScanTalk: 3D Talking Heads from Unregistered Scans
Federico Nocentini, Thomas Besnier, Claudio Ferrari, Sylvain Arguillere, Stefano Berretti, Mohamed Daoudi Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
Eric Brachmann, Jamie Wynn, Shuai Chen, Tommaso Cavallari, Aron Monszpart, Daniyar Turmukhambetov, Victor Adrian Prisacariu Scene-Conditional 3D Object Stylization and Composition
Jinghao Zhou, Tomas Jakab, Philip Torr, Christian Rupprecht SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs
Yang Miao, Francis Engelmann, Olga Vysotska, Federico Tombari, Marc Pollefeys, Daniel Barath SceneScript: Reconstructing Scenes with an Autoregressive Structured Language Model
Armen Avetisyan, Christopher Xie, Henry Howard-Jenkins, Tsun-Yi Yang, Samir Aroudj, Suvam Patra, Fuyang Zhang, Luke Holland, Duncan Frost, Campbell Orme, Jakob Engel, Edward Miller, Richard Newcombe, Vasileios Balntas SceneTeller: Language-to-3D Scene Generation
Basak Melis Ocal, Maxim Tatarchenko, Sezer Karaoglu, Theo Gevers SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
Baoxiong Jia, Yixin Chen, Huangyue Yu, Yan Wang, Xuesong Niu, Tengyu Liu, Qing Li, Siyuan Huang SCOD: From Heuristics to Theory
Vojtech Franc, Jakub Paplham, Daniel Prusa SCOMatch: Alleviating Overtrusting in Open-Set Semi-Supervised Learning
Zerun Wang, Liuyu Xiang, Lang Huang, Jiafeng Mao, Ling Xiao, Toshihiko Yamasaki SCPNet: Unsupervised Cross-Modal Homography Estimation via Intra-Modal Self-Supervised Learning
Runmin Zhang, Jun Ma, Lun Luo, Beinan Yu, Shu-Jie Chen, Junwei Li, Hui-Liang Shen, Si-Yuan Cao See and Think: Embodied Agent in Virtual Environment
Zhonghan Zhao, Xuan Wang, Wenhao Chai, Boyi Li, Shengyu Hao, Shidong Cao, Tian Ye, Gaoang Wang SEED: A Simple and Effective 3D DETR in Point Clouds
Zhe Liu, Jinghua Hou, Xiaoqing Ye, Tong Wang, Jingdong Wang, Xiang Bai Seeing Faces in Things: A Model and Dataset for Pareidolia
Mark T Hamilton, Simon Stent, Vasha G DuTell, Anne Harrington, Jennifer E Corbett, Ruth Rosenholtz, William T. Freeman SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving
Qingwen Zhang, Yi Yang, Peizheng Li, Olov Andersson, Patric Jensfelt SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
Lingchen Meng, Shiyi Lan, Hengduo Li, Jose M Alvarez, Zuxuan Wu, Yu-Gang Jiang Segment and Recognize Anything at Any Granularity
Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts
Jianhao Li, Tianyu Sun, Zhongdao Wang, Enze Xie, Bailan Feng, Hongbo Zhang, Ze Yuan, Ke Xu, Jiaheng Liu, Ping Luo Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation Without Manual Labels
Rui Huang, Songyou Peng, Ayca Takmaz, Federico Tombari, Marc Pollefeys, Shiji Song, Gao Huang, Francis Engelmann SeiT++: Masked Token Modeling Improves Storage-Efficient Training
Minhyun Lee, Song Park, Byeongho Heo, Dongyoon Han, Hyunjung Shim SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery
Sarah Rastegar, Mohammadreza Salehi, Yuki M Asano, Hazel Doughty, Cees Snoek Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Jungwoo Kim, Wooseok Jang, SeonHwa Kim, Hyun Hee Park, Kyong Hwan Jin, Seungryong Kim Self-Supervised Audio-Visual Soundscape Stylization
Tingle Li, Renhao Wang, Po-Yao Huang, Andrew Owens, Gopala Krishna Anumanchipalli Self-Supervised Feature Adaptation for 3D Industrial Anomaly Detection
Yuanpeng Tu, Boshen Zhang, Liang Liu, Yuxi Li, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Cairong Zhao Self-Supervised Video Desmoking for Laparoscopic Surgery
Renlong Wu, Zhilu Zhang, Shuohao Zhang, Longfei Gou, Haobin Chen, Lei Zhang, Hao Chen, Wangmeng Zuo Self-Training Room Layout via Geometry-Aware Ray-Casting
Bolivar Solarte, Chin-Hsuan Wu, Jin-Cheng Jhang, Jonathan Lee, Yi-Hsuan Tsai, Min Sun Semantic Residual Prompts for Continual Learning
Martin Menabue, Emanuele Frascaroli, Matteo Boschini, Enver Sangineto, Lorenzo Bonicelli, Angelo Porrello, Simone Calderara Semantically Guided Representation Learning for Action Anticipation
Anxhelo Diko, Danilo Avola, Bardh Prenkaj, Federico Fontana, Luigi Cinque SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc Van Gool, Federico Tombari SemReg: Semantics Constrained Point Cloud Registration
Sheldon Fung, Xuequan Lu, Dasith de Silva Edirimuni, Wei Pan, Xiao Liu, Hongdong Li SemTrack: A Large-Scale Dataset for Semantic Tracking in the Wild
Pengfei Wang, Xiaofei Hui, Jing Wu, Zile Yang, Kian Eng Ong, Xinge Zhao, Beijia Lu, Dezhao Huang, Evan Ling, Weiling Chen, Keng Teck Ma, Minhoe Hur, Jun Liu SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization
Yiyang Chen, Siyan Dong, Xulong Wang, Lulu Cai, Youyi Zheng, Yanchao Yang SGS-SLAM: Semantic Gaussian Splatting for Neural Dense SLAM
Mingrui Li, Shuhong Liu, Heng Zhou, Guohao Zhu, Na Cheng, Tianchen Deng, Hongyu Wang Shape from Heat Conduction
Sriram Narayanan, Mani Ramanagopal, Mark Sheinin, Aswin C. Sankaranarayanan, Srinivasa G. Narasimhan Shapefusion: 3D Localized Human Diffusion Models
Rolandos Alexandros Potamias, Michael Tarasiou, Stylianos Ploumpis, Stefanos Zafeiriou ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, Li Yi, Kaisheng Ma ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin SIGMA: Sinkhorn-Guided Masked Video Modeling
Mohammadreza Salehi, Michael Dorkenwald, Fida Mohammad Thoker, Efstratios Gavves, Cees Snoek, Yuki M Asano SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai, Lukas Hoyer, Luc Van Gool, Federico Tombari SIMBA: Split Inference - Mechanisms, Benchmarks and Attacks
Abhishek Singh, Vivek Sharma, Rohan Sukumaran, John J Mose, Jeffrey K Chiu, Justin Yu, Ramesh Raskar Similarity of Neural Architectures Using Adversarial Attack Transferability
Jaehui Hwang, Dongyoon Han, Byeongho Heo, Song Park, Sanghyuk Chun, Jong-Seok Lee Situated Instruction Following
So Yeon Min, Xavier Puig, Devendra Singh Chaplot, Tsung-Yen Yang, Priyam Parashar, Akshara Rai, Ruslan Salakhutdinov, Yonatan Bisk, Roozbeh Mottaghi Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures
Yannick Kirchhoff, Maximilian R Rokuss, Saikat Roy, Balint Kovacs, Constantin Ulrich, Tassilo Wald, Maximilian Zenk, Philipp Vollmuth, Jens Kleesiek, Fabian Isensee, Klaus H. Maier-Hein Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
Yingshan Chang, Yasi Zhang, Zhiyuan Fang, Ying Nian Wu, Yonatan Bisk, Feng Gao SkyMask: Attack-Agnostic Robust Federated Learning with Fine-Grained Learnable Masks
Peishen Yan, Hao Wang, Tao Song, Yang Hua, Ruhui Ma, Ningxin Hu, Mohammad Reza Haghighat, Haibing Guan SkyScenes: A Synthetic Dataset for Aerial Scene Understanding
Sahil S Khose, Anisha Pal, Aayushi Agarwal, Deepanshi, Judy Hoffman, Prithvijit Chattopadhyay SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking
Siyuan Li, Lei Ke, Yung-Hsu Yang, Luigi Piccinelli, Mattia Segù, Martin Danelljan, Luc Van Gool SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions
Xiaoyu Liu, Yuxiang Wei, Ming Liu, Xianhui Lin, Peiran Ren, Xuansong Xie, Wangmeng Zuo SMooDi: Stylized Motion Diffusion Model
Lei Zhong, Yiming Xie, Varun Jampani, Deqing Sun, Huaizu Jiang Snuffy: Efficient Whole Slide Image Classifier
Hossein Jafarinia, Alireza Alipanah, Saeed Razavi, Nahal Mirzaie, Mohammad Hossein Rohban Soft Prompt Generation for Domain Generalization
Shuanghao Bai, Yuedi Zhang, Wanqi Zhou, Zhirong Luan, Badong Chen Solving Motion Planning Tasks with a Scalable Generative Model
Yihan Hu, Siqi Chai, Zhening Yang, Jingyu Qian, Kun Li, Wenxin Shao, Haichao Zhang, Wei Xu, Qiang Liu Source-Free Domain-Invariant Performance Prediction
Ekaterina Khramtsova, Mahsa Baktashmotlagh, Guido Zuccon, Xi Wang, Mathieu Salzmann Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion
Huadong Li, Minhao Jing, Jin Wang, Shichao Dong, Jiajun Liang, Haoqiang Fan, Renhe Ji Sparse Refinement for Efficient High-Resolution Semantic Segmentation
Zhijian Liu, Zhuoyang Zhang, Samir Khaki, Shang Yang, Haotian Tang, Chenfeng Xu, Kurt Keutzer, Song Han SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, Bo Dai SphereHead: Stable 3D Full-Head Synthesis with Spherical Tri-Plane Representation
Heyuan Li, Ce Chen, Tianhao Shi, Yuda Qiu, Sizhe An, Guanying Chen, Xiaoguang Han Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Heeseung Yun, Ruohan Gao, Ishwarya Ananthabhotla, Anurag Kumar, Jacob Donley, Chao Li, Gunhee Kim, Vamsi Krishna Ithapu, Calvin Murdock SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-Modal Large Language Models
Ziyi Lin, Dongyang Liu, Renrui Zhang, Peng Gao, Longtian Qiu, Han Xiao, Han Qiu, Wenqi Shao, Keqin Chen, Jiaming Han, Siyuan Huang, Yichi Zhang, Xuming He, Yu Qiao, Hongsheng Li Spiking Wavelet Transformer
Yuetong Fang, Ziqing Wang, Lingfeng Zhang, Jiahang Cao, Honglei Chen, Renjing Xu SPIN: Hierarchical Segmentation with Subpart Granularity in Natural Images
Josh David Myers-Dean, Jarek T Reynolds, Brian Price, Yifei Fan, Danna Gurari SPIRE: Semantic Prompt-Driven Image Restoration
Chenyang Qi, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Qifeng Chen, Hossein Talebi SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction
Marko Mihajlovic, Sergey Prokudin, Siyu Tang, Robert Maier, Federica Bogo, Tony Tung, Edmond Boyer Spline-Based Transformers
Prashanth Chandran, Agon Serifi, Markus Gross, Moritz Bächer SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
Guohao Sun, Can Qin, Jiaminan Wang, Zeyuan Chen, Ran Xu, Zhiqiang Tao SRPose: Two-View Relative Pose Estimation with Sparse Keypoints
Rui Yin, Yulun Zhang, Zherong Pan, Jianjun Zhu, Cheng Wang, Biao Jia SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning
Mengxin Zheng, Jiaqi Xue, Zihao Wang, Xun Chen, Qian Lou, Lei Jiang, Xiaofeng Wang ST-LLM: Large Language Models Are Effective Temporal Learners
Ruyang Liu, Chen Li, Haoran Tang, Yixiao Ge, Ying Shan, Ge Li Stable Video Portraits
Mirela Ostrek, Justus Thies StableDrag: Stable Dragging for Point-Based Image Editing
Yutao Cui, Xiaotong Zhao, Guozhen Zhang, Shengming Cao, Kai Ma, Limin Wang STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians
Yifei Zeng, Yanqin Jiang, Siyu Zhu, Yuanxun Lu, Youtian Lin, Hao Zhu, Weiming Hu, Xun Cao, Yao Yao Statewide Visual Geolocalization in the Wild
Florian Fervers, Sebastian Bullinger, Christoph Bodensteiner, Michael Arens, Rainer Stiefelhagen StereoGlue: Joint Feature Matching and Robust Estimation
Daniel Barath, Dmytro Mishkin, Luca Cavalli, Paul-Edouard Sarlin, Petr Hruby, Marc Pollefeys Stitched ViTs Are Flexible Vision Backbones
Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang Stream Query Denoising for Vectorized HD-mAP Construction
Shuo Wang, Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Zehui Chen, Tiancai Wang, Chi Zhang, Xiangyu Zhang, Feng Zhao Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, Sida Peng Strike a Balance in Continual Panoptic Segmentation
Jinpeng Chen, Runmin Cong, Yuxuan Luo, Horace Ho Shing Ip, Sam Kwong Stripe Observation Guided Inference Cost-Free Attention Mechanism
Zhongzhan Huang, Shanshan Zhong, Wushao Wen, Jinghui Qin, Liang Lin Structured-NeRF: Hierarchical Scene Graph with Neural Representation
Zhide Zhong, Jiakai Cao, Songen Gu, Sirui Xie, Liyi Luo, Hao Zhao, Guyue Zhou, Haoang Li, Zike Yan Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation
Mathias Öttl, Frauke Wilm, Jana Steenpass, Jingna Qiu, Matthias Rübner, Prof Arndt Hartmann, Matthias W. Beckmann, Peter Fasching, Andreas K Maier, Ramona Erber, Bernhard Kainz, Katharina Breininger StyleCity: Large-Scale 3D Urban Scenes Stylization
Yingshu Chen, Huajian Huang, Tuan-Anh Vu, Ka Chun Shum, Sai-Kit Yeung StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models
Wen Li, Muyuan Fang, Cheng Zou, Biao Gong, Ruobing Zheng, Meng Wang, Jingdong Chen, Ming Yang SUMix: Mixup with Semantic and Uncertain Information
Huafeng Qin, Xin Jin, Hongyu Zhu, Hongchao Liao, Mounim A. El Yacoubi, Xinbo Gao SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference
Alind Khare, Animesh Agrawal, Aditya Annavajjala, Payman Behnam, Myungjin Lee, Hugo M Latapie, Alexey Tumanov SuperGaussian: Repurposing Video Models for 3D Super Resolution
Yuan Shen, Duygu Ceylan, Paul Guerrero, Zexiang Xu, Niloy J. Mitra, Shenlong Wang, Anna Fruehstueck Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models
Zhengming Yu, Zhiyang Dou, Xiaoxiao Long, Cheng Lin, Zekun Li, Yuan Liu, Norman Müller, Taku Komura, Marc Habermann, Christian Theobalt, Xin Li, Wenping Wang Surface-Centric Modeling for High-Fidelity Generalizable Neural Surface Reconstruction
Rui Peng, Shihe Shen, Kaiqiang Xiong, Huachen Gao, Jianbo Jiao, Xiaodong Gu, Ronggang Wang SV3D: Novel Multi-View Synthesis and 3D Generation from a Single Image Using Latent Video Diffusion
Vikram Voleti, Chun-Han Yao, Mark Boss, Adam Letts, David Pankratz, Dmitrii Tochilkin, Christian Laforte, Robin Rombach, Varun Jampani SWAG: Splatting in the Wild Images with Appearance-Conditioned Gaussians
Hiba Dahmani, Moussab Bennehar, Nathan Piasco, Luis G Roldao Jimenez, Dzmitry Tsishkou SwapAnything: Enabling Arbitrary Object Swapping in Personalized Image Editing
Jing Gu, Nanxuan Zhao, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Yilin Wang, Xin Eric Wang SweepNet: Unsupervised Learning Shape Abstraction via Neural Sweepers
Mingrui Zhao, Yizhi Wang, Fenggen Yu, Changqing Zou, Ali Mahdavi-Amiri SwiftBrush V2: Make Your One-Step Diffusion Model Better than Its Teacher
Trung Tuan Dao, Thuan Hoang Nguyen, Thanh Van Le, Duc H Vu, Khoi Nguyen, Cuong Pham, Anh T Tran SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting
Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, Eduardo Pérez Pellitero Synchronization of Projective Transformations
Rakshith Madhavan, Andrea Fusiello, Federica Arrigoni Synthesizing Environment-Specific People in Photographs
Mirela Ostrek, Carol O'Sullivan, Michael J. Black, Justus Thies T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Qing Jiang, Feng Li, Zhaoyang Zeng, Shilong Liu, Tianhe Ren, Lei Zhang Tackling Structural Hallucination in Image Translation with Local Diffusion
Seunghoi Kim, Chen Jin, Tom Diethe, Matteo Figini, Henry FJ Tregidgo, Asher Mullokandov, Philip A Teare, Daniel Alexander Take a Step Back: Rethinking the Two Stages in Visual Reasoning
Mingyu Zhang, Jiting Cai, Mingyu Liu, Yue Xu, Cewu Lu, Yong-Lu Li Taming Latent Diffusion Model for Neural Radiance Field Inpainting
Chieh Hubert Lin, Changil Kim, Jia-Bin Huang, Qinbo Li, Chih-Yao Ma, Johannes Kopf, Ming-Hsuan Yang, Hung-Yu Tseng Taming Lookup Tables for Efficient Image Retouching
Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang TAPTR: Tracking Any Point with Transformers as Detection
Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Lei Zhang TC4D: Trajectory-Conditioned Text-to-4D Generation
Sherwin Bahmani, Xian Liu, Wang Yifan, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi, David B Lindell TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving
Cheng Zhao, Su Sun, Ruoyu Wang, Yuliang Guo, Jun-Jun Wan, Zhou Huang, Xinyu Huang, Yingjie Victor Chen, Liu Ren Temporal Residual Jacobians for Rig-Free Motion Transfer
Sanjeev Muralikrishnan, Niladri Shekhar Dutt, Siddhartha Chaudhuri, Noam Aigerman, Vladimir Kim, Matthew Fisher, Niloy Mitra Temporally Consistent Stereo Matching
Jiaxi Zeng, Chengtang Yao, Yuwei Wu, Yunde Jia TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation
Nikolai Kalischek, Torben Peters, Jan Dirk Wegner, Konrad Schindler TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
Yufei Liu, Junwei Zhu, Junshu Tang, Shijie Zhang, Jiangning Zhang, Weijian Cao, Chengjie Wang, Yunsheng Wu, Dongjin Huang TexGen: Text-Guided 3D Texture Generation with Multi-View Sampling and Resampling
Dong Huo, Zixin Guo, Xinxin Zuo, Zhihao Shi, Juwei Lu, Peng Dai, Songcen Xu, Li Cheng, Yee-Hong Yang Text to Layer-Wise 3D Clothed Human Generation
Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, Jingbo Wang, Sida Peng, Bo Dai Text-Conditioned Resampler for Long Form Video Understanding
Bruno Korbar, Yongqin Xian, Alessio Tonioni, Andrew Zisserman, Federico Tombari Text-Guided Video Masked Autoencoder
David Fan, Jue Wang, Shuai Liao, Zhikang Zhang, Vimal Bhat, Xinyu Li Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
Animesh Sinha, Bo Sun, Anmol Kalia, Arantxa Casanova, Elliot Blanchard, David Yan, Winnie Zhang, Tony Nelli, Jiahui Chen, Hardik Shah, Licheng Yu, Mitesh Kumar Singh, Ankit Ramchandani, Maziar Sanjabi, Sonal Gupta, Amy L Bearman, Dhruv Mahajan Text2Place: Affordance-Aware Text Guided Human Placement
Rishubh Parihar, Harsh Gupta, Sachidanand Vs, Venkatesh Babu Radhakrishnan Textual Grounding for Open-Vocabulary Visual Information Extraction in Layout-Diversified Documents
Mengjun Cheng, Chengquan Zhang, Chang Liu, Yuke Li, Bohan Li, Kun Yao, Xiawu Zheng, Rongrong Ji, Jie Chen TF-FAS: Twofold-Element Fine-Grained Semantic Guidance for Generalizable Face Anti-Spoofing
Xudong Wang, Ke-Yue Zhang, Taiping Yao, Qianyu Zhou, Shouhong Ding, Pingyang Dai, Rongrong Ji The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Weiyun Wang, Yiming Ren, Haowen Luo, Tiantong Li, Chenxiang Yan, Zhe Chen, Wenhai Wang, Qingyun Li, Lewei Lu, Xizhou Zhu, Yu Qiao, Jifeng Dai The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation
Yi Yao, Chan-Feng Hsu, Jhe-Hao Lin, Hongxia Xie, Terence Lin, Yi-Ning Huang, Hong-Han Shuai, Wen-Huang Cheng Thinking Outside the BBox: Unconstrained Generative Object Compositing
Gemma Canet Tarrés, Zhe Lin, Zhifei Zhang, Jianming Zhang, Yizhi Song, Dan Ruta, Andrew Gilbert, John Collomosse, Soo Ye Kim This Probably Looks Exactly like That: An Invertible Prototypical Network
Zachariah Carmichael, Timothy P Redgrave, Daniel Gonzalez Cedre, Walter Scheirer TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models
Aditya Chinchure, Pushkar Shukla, Gaurav Bhatt, Kiri Salij, Kartik Hosanagar, Leonid Sigal, Matthew Turk Timestep-Aware Correction for Quantized Diffusion Models
Yuzhe Yao, Feng Tian, Jun Chen, Haonan Lin, Guang Dai, Yong Liu, Jingdong Wang Tiny Models Are the Computational Saver for Large Models
Qingyuan Wang, Barry Cardiff, Antoine Frappé, Benoit Larras, Deepu John TLControl: Trajectory and Language Control for Human Motion Synthesis
Weilin Wan, Zhiyang Dou, Taku Komura, Wenping Wang, Dinesh Jayaraman, Lingjie Liu TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
Bu Jin, Yupeng Zheng, Pengfei Li, Weize Li, Yuhang Zheng, Sujie Hu, Xinyu Liu, Jinwei Zhu, Zhijie Yan, Haiyang Sun, Kun Zhan, Peng Jia, Xiaoxiao Long, Yilun Chen, Hao Zhao Tokenize Anything via Prompting
Ting Pan, Lulu Tang, Xinlong Wang, Shiguang Shan Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture
Xuanchen Li, Yuhao Cheng, Xingyu Ren, Haozhe Jia, Di Xu, Wenhan Zhu, Yichao Yan Toward Tiny and High-Quality Facial Makeup with Data Amplify Learning
Qiaoqiao Jin, Xuanhong Chen, Meiguang Jin, Ying Chen, Rui Shi, Yucheng Zheng, Yupeng Zhu, Bingbing Ni Towards Certifiably Robust Face Recognition
Seunghun Paik, Dongsoo Kim, Chanwoo Hwang, Sunpill Kim, Jae Hong Seo Towards Compact Reversible Image Representations for Neural Style Transfer
Xiyao Liu, Siyu Yang, Jian Zhang, Gerald Schaefer, Jiya Li, Xunli Fan, Songtao Wu, Hui Fang Towards Image Ambient Lighting Normalization
Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu, Rakesh Ranjan, Radu Timofte Towards Multi-Modal Transformers in Federated Learning
Guangyu Sun, Matias Mendieta, Aritra Dutta, Xin Li, Chen Chen Towards Multimodal Sentiment Analysis Debiasing via Bias Purification
Dingkang Yang, Mingcheng Li, Dongling Xiao, Yang Liu, Kun Yang, Zhaoyu Chen, Yuzheng Wang, Peng Zhai, Ke Li, Lihua Zhang Towards Neuro-Symbolic Video Understanding
Minkyu Choi, Harsh Goel, Mohammad Omama, Yunhao Yang, Sahil Shah, Sandeep Chinchali Towards Open Domain Text-Driven Synthesis of Multi-Person Motions
Mengyi Shan, Lu Dong, Yutao Han, Yuan Yao, Tao Liu, Ifeoma Nwogu, Guo-Jun Qi, Mitchell K Hill Towards Open-Ended Visual Quality Comparison
Haoning Wu, Hanwei Zhu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Annan Wang, Wenxiu Sun, Qiong Yan, Xiaohong Liu, Guangtao Zhai, Shiqi Wang, Weisi Lin Towards Open-World Object-Based Anomaly Detection via Self-Supervised Outlier Synthesis
Brian Kostadinov Shalon Isaac-Medina, Yona Falinie Abdul Gaus, Neelanjan Bhowmik, Toby P Breckon Towards Reliable Advertising Image Generation Using Human Feedback
Zhenbang Du, Wei Feng, Haohan Wang, Yaoyu Li, Jingsen Wang, Jian Li, Zheng Zhang, Jingjing Lv, Xin Zhu, Junsheng Jin, Junjie Shen, Zhangang Lin, Jingping Shao Towards Robust Full Low-Bit Quantization of Super Resolution Networks
Denis S. Makhov, Irina Zhelavskaya, Ruslan Ostapets, Dehua Song, Kirill Solodskikh Towards Scene Graph Anticipation
Rohith Peddi, Saksham Singh, Saurabh, Parag Singla, Vibhav Gogate Towards Stable 3D Object Detection
Jiabao Wang, Qiang Meng, Guochao Liu, Liujiang Yan, Ke Wang, Ming-Ming Cheng, Qibin Hou TPA3D: Triplane Attention for Fast Text-to-3D Generation
Bin-Shih Wu, Hong-En Chen, Sheng-Yu Huang, Yu-Chiang Frank Wang Track Everything Everywhere Fast and Robustly
Yunzhou Song, Jiahui Lei, Ziyun Wang, Lingjie Liu, Kostas Daniilidis TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks
Jinjie Mai, Wenxuan Zhu, Sara Rojas, Jesus Zarzar, Abdullah Hamdi, Guocheng Qian, Bing Li, Silvio Giancola, Bernard Ghanem Trainable Highly-Expressive Activation Functions
Irit Chelly, Shahaf E. Finder, Shira Ifergane, Oren Freifeld Training a Secure Model Against Data-Free Model Extraction
Zhenyi Wang, Li Shen, Junfeng Guo, Tiehang Duan, Siyu Luan, Tongliang Liu, Mingchen Gao Training-Free Model Merging for Multi-Target Domain Adaptation
Wenyi Li, Huan-ang Gao, Mingju Gao, Beiwen Tian, Rong Zhi, Hao Zhao Trajectory-Aligned Space-Time Tokens for Few-Shot Action Recognition
Pulkit Kumar, Namitha Padmanabhan, Luke Luo, Sai Saketh Rambhatla, Abhinav Shrivastava TrajPrompt: Aligning Color Trajectory with Vision-Language Representations
Li-Wu Tsao, Hao-Tang Tsui, Yu-Rou Tuan, Pei-Chi Chen, Kuan-Lin Wang, Jhih-Ciang Wu, Hong-Han Shuai, Wen-Huang Cheng TransCAD: A Hierarchical Transformer for CAD Sequence Inference from Point Clouds
Elona Dupont, Kseniya Cherenkova, Dimitrios Mallis, Gleb A Gusev, Anis Kacem, Djamila Aouada Tree-D Fusion: Simulation-Ready Tree Dataset from Single Images with Diffusion Priors
Jae Joong Lee, Bosheng Li, Sara M Beery, Jonathan Huang, Songlin Fei, Raymond A. Yeh, Bedrich Benes Tri^2-Plane: Thinking Head Avatar via Feature Pyramid
Luchuan Song, Pinxin Liu, Lele Chen, Guojun Yin, Chenliang Xu TrojVLM: Backdoor Attack Against Vision Language Models
Weimin Lyu, Lu Pang, Tengfei Ma, Haibin Ling, Chao Chen Tuning-Free Image Customization with Image and Text Guidance
Pengzhi Li, Qiang Nie, Ying Chen, Xi Jiang, Kai Wu, Yuhuan Lin, Yong Liu, Jinlong Peng, Chengjie Wang, Feng Zheng Turbo: Informativity-Driven Acceleration Plug-in for Vision-Language Large Models
Chen Ju, Haicheng Wang, Haozhe Cheng, Xu Chen, Zhonghua Zhai, Weilin Huang, Jinsong Lan, Shuai Xiao, Bo Zheng TurboEdit: Real-Time Text-Based Disentangled Real Image Editing
Zongze Wu, Nicholas I Kolkin, Jonathan Brandt, Richard Zhang, Eli Shechtman Two-Stage Video Shadow Detection via Temporal-Spatial Adaption
Xin Duan, Yu Cao, Lei Zhu, Gang Fu, Xin Wang, Renjie Zhang, Ping Li U-COPE: Taking a Further Step to Universal 9d Category-Level Object Pose Estimation
Li Zhang, Weiqing Meng, Yan Zhong, Bin Kong, Mingliang Xu, Jianming Du, Xue Wang, Rujing Wang, Liu Liu UAV First-Person Viewers Are Radiance Field Learners
Liqi Yan, Qifan Wang, Junhan Zhao, Qiang Guan, Zheng Tang, Jianhui Zhang, Dongfang Liu uCAP: An Unsupervised Prompting Method for Vision-Language Models
A. Tuan Nguyen, Kai Sheng Tai, Bor-Chun Chen, Satya Narayan Shukla, Hanchao Yu, Philip Torr, Tai-Peng Tian, Ser-Nam Lim UGG: Unified Generative Grasping
Jiaxin Lu, Hao Kang, Haoxiang Li, Bo Liu, Yiding Yang, Qixing Huang, Gang Hua UMBRAE: Unified Multimodal Brain Decoding
Weihao Xia, Raoul de Charette, A. Cengiz Oztireli, Jing-Hao Xue UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Bowen Shi, Peisen Zhao, Zichen Wang, Yuhang Zhang, Yaoming Wang, Jin Li, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian, Xiaopeng Zhang Uncertainty-Aware Sign Language Video Retrieval with Probability Distribution Modeling
Xuan Wu, Hongxiang Li, Yuanjiang Luo, Xuxin Cheng, Xianwei Zhuang, Meng Cao, Keren Fu Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning
Zijun Long, Lipeng Zhuang, George W Killick, Richard Mccreadie, Gerardo Aragon-Camarasa, Paul Henderson Understanding Physical Dynamics with Counterfactual World Modeling
Rahul Venkatesh, Honglin Chen, Kevin Feigelis, Daniel M Bear, Khaled Jedoui, Klemen Kotar, Felix J Binder, Wanhee Lee, Sherry Liu, Kevin Smith, Judith E. Fan, Daniel Yamins UNIC: Universal Classification Models via Multi-Teacher Distillation
Yannis Kalantidis, Diane Larlus, Mert Bulent Sariyildiz, Philippe Weinzaepfel, Thomas Lucas UniCal: Unified Neural Sensor Calibration
Ze Yang, George G Chen, Haowei Zhang, Kevin Ta, Ioan Andrei Bârsan, Daniel Murphy, Sivabalan Manivasagam, Raquel Urtasun UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation
Zexiang Liu, Yangguang Li, Youtian Lin, Xin Yu, Sida Peng, Yan-Pei Cao, Xiaojuan Qi, Xiaoshui Huang, Ding Liang, Wanli Ouyang Unified Local-Cloud Decision-Making via Reinforcement Learning
Kathakoli Sengupta, Zhongkai Shangguan, Sandesh Bharadwaj, Sanjay Arora, Eshed Ohn-Bar, Renato Mancuso Unified Medical Image Pre-Training in Language-Guided Common Semantic Space
Xiaoxuan He, Yifan Yang, Xinyang Jiang, Xufang Luo, Haoji Hu, Siyun Zhao, Dongsheng Li, Yuqing Yang, Lili Qiu UniFS: Universal Few-Shot Instance Perception with Point Representations
Sheng Jin, Ruijie Yao, Lumin Xu, Wentao Liu, Chen Qian, Ji Wu, Ping Luo Unifying 3D Vision-Language Understanding via Promptable Queries
Ziyu Zhu, Zhuofan Zhang, Xiaojian Ma, Xuesong Niu, Yixin Chen, Baoxiong Jia, Zhidong Deng, Siyuan Huang, Qing Li UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Cong Wei, Yang Chen, Haonan Chen, Hexiang Hu, Ge Zhang, Jie Fu, Alan Ritter, Wenhu Chen UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening
Siyuan Cheng, Guangyu Shen, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Hanxi Guo, Shiqing Ma, Xiangyu Zhang UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction
Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud, Eloi Zablocki, Matthieu Cord, Alexandre Alahi Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo, Jingwen Chen, Yehao Li, Yingwei Pan, Jianlin Feng, Hongyang Chao, Ting Yao Unleashing the Power of Prompt-Driven Nucleus Instance Segmentation
Zhongyi Shui, Yunlong Zhang, Kai Yao, Chenglu Zhu, Sunyi Zheng, Jingxiong Li, Honglin Li, Yuxuan Sun, Ruizhe Guo, Lin Yang Unmasking Bias in Diffusion Model Training
Hu Yu, Li Shen, Jie Huang, Hongsheng Li, Feng Zhao Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement
Lingyu Zhu, Wenhan Yang, Baoliang Chen, Hanwei Zhu, Zhangkai Ni, Qi Mao, Shiqi Wang Unsupervised Exposure Correction
Ruodai Cui, Li Niu, Guosheng Hu Unsupervised Moving Object Segmentation with Atmospheric Turbulence
Dehao Qin, Ripon k Saha, Woojeh Chung, Suren Jayasuriya, Jinwei Ye, Nianyi Li Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement
Kun Zhou, Xinyu Lin, Wenbo Li, Xiaogang Xu, Yuanhao Cai, Zhonghang Liu, Xiaoguang Han, Jiangbo Lu UpFusion: Novel View Diffusion from Unposed Sparse View Observations
Bharath Raj Nagoor Kani, Hsin-Ying Lee, Sergey Tulyakov, Shubham Tulsiani UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues
Vandad Davoodnia, Saeed Ghorbani, Marc-André Carbonneau, Alexandre Messier, Ali Etemad V-IRL: Grounding Virtual Intelligence in Real Life
Jihan Yang, Runyu Ding, Ellis L Brown, Xiaojuan Qi, Saining Xie V2X-Real: A Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception
Hao Xiang, Xin Xia, Zhaoliang Zheng, Runsheng Xu, Letian Gao, Zewei Zhou, Xu Han, Xinkai Ji, Mingxi Li, Zonglin Meng, Li Jin, Mingyue Lei, Zhaoyang Ma, Zihang He, Haoxuan Ma, Yunshuang Yuan, Yingqian Zhao, Jiaqi Ma Vamos: Versatile Action Models for Video Understanding
Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, Jinrong Yang, Jianjian Sun, Chunrui Han, Xiangyu Zhang VCP-CLIP: A Visual Context Prompting Model for Zero-Shot Anomaly Segmentation
Zhen Qu, Xian Tao, Mukesh Prasad, Fei Shen, Zhengtao Zhang, Xinyi Gong, Guiguang Ding VeCLIP: Improving CLIP Training via Visual-Enriched Captions
Zhengfeng Lai, Haotian Zhang, Bowen Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang, Meng Cao VEON: Vocabulary-Enhanced Occupancy Prediction
Jilai Zheng, Pin Tang, Zhongdao Wang, Guoqing Wang, Xiangxuan Ren, Bailan Feng, Chao Ma Video Editing via Factorized Diffusion Distillation
Uriel Singer, Amit Zohar, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Devi Parikh, Yaniv Taigman Video Question Answering with Procedural Programs
Rohan Choudhury, Koichiro Niinuma, Kris Kitani, Laszlo A Jeni VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park, Hee-Seon Kim, Kangwook Ko, Minbeom Kim, Changick Kim VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li, Xinhao Li, Yi Wang, Yinan He, Yali Wang, Limin Wang, Yu Qiao View-Consistent 3D Editing with Gaussian Splatting
Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang ViG-Bias: Visually Grounded Bias Discovery and Mitigation
Badr-Eddine Marani, Mohamed Hanini, Nihitha Malayarukil, Stergios Christodoulidis, Maria Vakalopoulou, Enzo Ferrante ViLA: Efficient Video-Language Alignment for Video Question Answering
Xijun Wang, Junbang Liang, Chun-Kai Wang, Kenan Deng, Yu Lou, Ming C Lin, Shan Yang VISA: Reasoning Video Object Segmentation via Large Language Model
Cilin Yan, Haochen Wang, Shilin Yan, Xiaolong Jiang, Yao Hu, Guoliang Kang, Weidi Xie, Efstratios Gavves VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement
Hanjung Kim, Jaehyun Kang, Miran Heo, Sukjun Hwang, Seoung Wug Oh, Seon Joo Kim VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich, Niv Nayman, Sharon Fogel, Inbal Lavi, Ron Litman, Shahar Tsiper, Royee Tichauer, Srikar Appalaraju, Shai Mazor, R. Manmatha VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
Seokha Moon, Hyun Woo, Hongbeen Park, Haeji Jung, Reza Mahjourian, Hyung-gun Chi, Hyerin Lim, Sangpil Kim, Jinkyu Kim Vista3D: Unravel the 3D Darkside of a Single Image
Qiuhong Shen, Xingyi Yang, Michael Bi Mi, Xinchao Wang Visual Prompting via Partial Optimal Transport
Mengyu Zheng, Zhiwei Hao, Yehui Tang, Chang Xu Visual Relationship Transformation
Xiaoyu Xu, Jiayan Qiu, Baosheng Yu, Zhou Wang Visual Text Generation in the Wild
Yuanzhi Zhu, Jiawei Liu, Feiyu Gao, Wenyu Liu, Xinggang Wang, Peng Wang, Fei Huang, Cong Yao, Zhibo Yang Volumetric Rendering with Baked Quadrature Fields
Gopal Sharma, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space
Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Antonio Agudo, Francesc Moreno VSViG: Real-Time Video-Based Seizure Detection via Skeleton-Based Spatiotemporal ViG
Yankun Xu, Junzhe Wang, Yun-Hsuan Chen, Jie Yang, Wenjie Ming, Shuang Wang, Mohamad Sawan WAS: Dataset and Methods for Artistic Text Segmentation
Xudong Xie, Yuzhe Li, Yang Liu, Zhifei Zhang, Zhaowen Wang, Wei Xiong, Xiang Bai WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians
Dmytro Kotovenko, Olga Grebenkova, Nikolaos Sarafianos, Avinash Paliwal, Pingchuan Ma, Omid Poursaeed, Sreyas Mohan, Yuchen Fan, Yilei Li, Rakesh Ranjan, Bjorn Ommer Watch Your Steps: Local Image and Scene Editing by Text Instructions
Ashkan Mirzaei, Tristan T Aumentado-Armstrong, Marcus A Brubaker, Jonathan Kelly, Alex Levinshtein, Konstantinos G Derpanis, Igor Gilitschenski WAVE: Warping DDIM Inversion Features for Zero-Shot Text-to-Video Editing
Yutang Feng, Sicheng Gao, Yuxiang Bao, Xiaodi Wang, Shumin Han, Juan Zhang, Baochang Zhang, Angela Yao Wavelet Convolutions for Large Receptive Fields
Shahaf E Finder, Roy Amoyal, Eran Treister, Oren Freifeld Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment
Mengting Chen, Xi Chen, Zhonghua Zhai, Chen Ju, Xuewen Hong, Jinsong Lan, Shuai Xiao WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
Zirui Shao, Feiyu Gao, Hangdi Xing, Zepeng Zhu, Zhi Yu, Jiajun Bu, Qi Zheng, Cong Yao Weighted Ensemble Models Are Strong Continual Learners
Imad Eddine Marouf, Subhankar Roy, Enzo Tartaglione, Stéphane Lathuilière WHAC: World-Grounded Humans and Cameras
Wanqi Yin, Zhongang Cai, Chen Wei, Fanzhou Wang, Ruisi Wang, Haiyi Mei, Weiye Xiao, Zhitao Yang, Qingping Sun, Atsushi Yamashita, Ziwei Liu, Lei Yang When and How Do Negative Prompts Take Effect?
Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Boqing Gong, Cho-Jui Hsieh When Do We Not Need Larger Vision Models?
Baifeng Shi, Ziyang Wu, Maolin Mao, Xin Wang, Trevor Darrell Where Am I? Scene Retrieval with Language
Jiaqi Chen, Daniel Barath, Iro Armeni, Marc Pollefeys, Hermann Blum WiMANS: A Benchmark Dataset for WiFi-Based Multi-User Activity Sensing
Shuokang Huang, Kaihan Li, Di You, Yichong Chen, Arvin Lin, Siying Liu, Xiaohui Li, Julie A. McCann Within the Dynamic Context: Inertia-Aware 3D Human Modeling with Pose Sequence
Yutong Chen, Yifan Zhan, Zhihang Zhong, Wei Wang, Xiao Sun, Yu Qiao, Yinqiang Zheng WordRobe: Text-Guided Generation of Textured 3D Garments
Astitva Srivastava, Pranav Manu, Amit Raj, Varun Jampani, Avinash Sharma WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation
Tianjian Jiang, Johsan Billingham, Sebastian Müksch, Juan J Zarate, Nicolas Evans, Martin R. Oswald, Marc Pollefeys, Otmar Hilliges, Manuel Kaufmann, Jie Song WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-Grained Spatial-Temporal Understanding
Quan Kong, Yuki Kawana, Rajat Saini, Ashutosh Kumar, Jingjing Pan, Ta Gu, Yohei Ozao, Balazs Opra, Yoichi Sato, Norimasa Kobori X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
Sirnam Swetha, Jinyu Yang, Tal Neiman, Mamshad Nayeem Rizve, Son Tran, Benjamin Yao, Trishul A Chilimbi, Mubarak Shah X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and Its Emergent Cross-Modal Reasoning
Artemis Panagopoulou, Le Xue, Ning Yu, Li Junnan, Dongxu Li, Shafiq Joty, Ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles X-Pose: Detecting Any Keypoints
Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang XPSR: Cross-Modal Priors for Diffusion-Based Image Super-Resolution
Qu Yunpeng, Kun Yuan, Kai Zhao, Qizhi Xie, Jinhua Hao, Ming Sun, Chao Zhou Zero-Shot Detection of AI-Generated Images
Davide Cozzolino, GIovanni Poggi, Matthias Niessner, Luisa Verdoliva Zero-Shot Image Feature Consensus with Deep Functional Maps
Xinle Cheng, Congyue Deng, Adam Harley, Yixin Zhu, Leonidas Guibas Zero-Shot Multi-Object Scene Completion
Shun Iwase, Katherine Liu, Vitor Guizilini, Adrien Gaidon, Kris Kitani, Rareș A Ambruș, Sergey Zakharov Zero-Shot Object Counting with Good Exemplars
Huilin Zhu, Jingling Yuan, Zhengwei Yang, Yu Guo, Xian Zhong, Zheng Wang, Shengfeng He ZeST: Zero-Shot Material Transfer from a Single Image
Ta-Ying Cheng, Prafull Sharma, Andrew Markham, Niki Trigoni, Varun Jampani ZigMa: A DiT-Style Zigzag Mamba Diffusion Model
Vincent Tao Hu, Stefan A Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes S Fischer, Bjorn Ommer ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Viraj Shah, Nataniel Ruiz, Forrester Cole, Erika Lu, Svetlana Lazebnik, Yuanzhen Li, Varun Jampani ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model
Fu-Yun Wang, Zhaoyang Huang, Qiang Ma, Guanglu Song, Xudong Lu, Weikang Bian, Yijin Li, Yu Liu, Hongsheng Li