CVPR 2024
2716 papers
360+x: A Panoptic Multi-Modal Scene Understanding Dataset
Hao Chen, Yuqi Hou, Chenyuan Qu, Irene Testini, Xiaohan Hong, Jianbo Jiao 3D Face Tracking from 2D Video Through Iterative Dense UV to Image Flow
Felix Taubner, Prashant Raina, Mathieu Tuli, Eu Wern Teh, Chul Lee, Jinmiao Huang 3D Facial Expressions Through Analysis-by-Neural-Synthesis
George Retsinas, Panagiotis P. Filntisis, Radek Danecek, Victoria F. Abrevaya, Anastasios Roussos, Timo Bolkart, Petros Maragos 3D Feature Tracking via Event Camera
Siqi Li, Zhikuan Zhou, Zhou Xue, Yipeng Li, Shaoyi Du, Yue Gao 3D Geometry-Aware Deformable Gaussian Splatting for Dynamic View Synthesis
Zhicheng Lu, Xiang Guo, Le Hui, Tianrui Chen, Min Yang, Xiao Tang, Feng Zhu, Yuchao Dai 3D Human Pose Perception from Egocentric Stereo Videos
Hiroyasu Akada, Jian Wang, Vladislav Golyanik, Christian Theobalt 3D Multi-Frame Fusion for Video Stabilization
Zhan Peng, Xinyi Ye, Weiyue Zhao, Tianqi Liu, Huiqiang Sun, Baopu Li, Zhiguo Cao 3D Neural Edge Reconstruction
Lei Li, Songyou Peng, Zehao Yu, Shaohui Liu, Rémi Pautrat, Xiaochuan Yin, Marc Pollefeys 3D-Aware Face Editing via Warping-Guided Latent Direction Learning
Yuhao Cheng, Zhuo Chen, Xingyu Ren, Wenhan Zhu, Zhengqin Xu, Di Xu, Changpeng Yang, Yichao Yan 3D-LFM: Lifting Foundation Model
Mosam Dabhi, László A. Jeni, Simon Lucey 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation
Songchun Zhang, Yibo Zhang, Quan Zheng, Rui Ma, Wei Hua, Hujun Bao, Weiwei Xu, Changqing Zou 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting
Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, Siyu Tang 3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-Labelling
Chaokang Jiang, Guangming Wang, Jiuming Liu, Hesheng Wang, Zhuang Ma, Zhenqiang Liu, Zhujin Liang, Yi Shan, Dalong Du 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, Xinggang Wang 4D-DRESS: A 4D Dataset of Real-World Human Clothing with Semantic Annotations
Wenbo Wang, Hsuan-I Ho, Chen Guo, Boxiang Rong, Artur Grigorev, Jie Song, Juan Jose Zarate, Otmar Hilliges 4D-Fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, David B. Lindell 4K4D: Real-Time 4D View Synthesis at 4k Resolution
Zhen Xu, Sida Peng, Haotong Lin, Guangzhao He, Jiaming Sun, Yujun Shen, Hujun Bao, Xiaowei Zhou A Category Agnostic Model for Visual Rearrangment
Yuyi Liu, Xinhang Song, Weijie Li, Xiaohan Wang, Shuqiang Jiang A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution
Zhixiong Yang, Jingyuan Xia, Shengxi Li, Xinghua Huang, Shuanghui Zhang, Zhen Liu, Yaowen Fu, Yongxiang Liu A General and Efficient Training for Transformer via Token Expansion
Wenxuan Huang, Yunhang Shen, Jiao Xie, Baochang Zhang, Gaoqi He, Ke Li, Xing Sun, Shaohui Lin A Picture Is Worth More than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Jack Urbanek, Florian Bordes, Pietro Astolfi, Mary Williamson, Vasu Sharma, Adriana Romero-Soriano A Recipe for Scaling up Text-to-Video Generation with Text-Free Videos
Xiang Wang, Shiwei Zhang, Hangjie Yuan, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang A Simple Baseline for Efficient Hand Mesh Reconstruction
Zhishan Zhou, Shihao Zhou, Zhi Lv, Minqiang Zou, Yao Tang, Jiajun Liang A Simple Recipe for Contrastively Pre-Training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak, Justin Chiu, Joe Heyward, Viorica Patraucean, Jiajun Shen, Antoine Miech, Andrew Zisserman, Aida Nematzdeh A Simple Recipe for Language-Guided Domain Generalized Segmentation
Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Patrick Pérez, Raoul de Charette A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack Against Split Learning
Xiaoyang Xu, Mengda Yang, Wenzhe Yi, Ziang Li, Juan Wang, Hongxin Hu, Yong Zhuang, Yaxin Liu A Theory of Joint Light and Heat Transport for Lambertian Scenes
Mani Ramanagopal, Sriram Narayanan, Aswin C. Sankaranarayanan, Srinivasa G. Narasimhan A Unified and Interpretable Emotion Representation and Expression Generation
Reni Paskaleva, Mykyta Holubakha, Andela Ilic, Saman Motamed, Luc Van Gool, Danda Paudel A Unified Approach for Text- and Image-Guided 4D Scene Generation
Yufeng Zheng, Xueting Li, Koki Nagano, Sifei Liu, Otmar Hilliges, Shalini De Mello A Vision Check-up for Language Models
Pratyusha Sharma, Tamar Rott Shaham, Manel Baradad, Stephanie Fu, Adrian Rodriguez-Munoz, Shivam Duggal, Phillip Isola, Antonio Torralba A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
Ruichen Ma, Guanchao Qiao, Yian Liu, Liwei Meng, Ning Ning, Yang Liu, Shaogang Hu AAMDM: Accelerated Auto-Regressive Motion Diffusion Model
Tianyu Li, Calvin Qiao, Guanqiao Ren, KangKang Yin, Sehoon Ha Abductive Ego-View Accident Video Understanding for Safe Driving Perception
Jianwu Fang, Lei-lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu, Chen Lv, Jianru Xue, Tat-Seng Chua Absolute Pose from One or Two Scaled and Oriented Features
Jonathan Ventura, Zuzana Kukelova, Torsten Sattler, Dániel Baráth Accelerating Diffusion Sampling with Optimized Time Steps
Shuchen Xue, Zhaoqiang Liu, Fei Chen, Shifeng Zhang, Tianyang Hu, Enze Xie, Zhenguo Li Accelerating Neural Field Training via Soft Mining
Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi Accept the Modality Gap: An Exploration in the Hyperbolic Space
Sameera Ramasinghe, Violetta Shevchenko, Gil Avraham, Ajanthan Thalaiyasingam ACT-Diffusion: Efficient Adversarial Consistency Training for One-Step Diffusion Models
Fei Kong, Jinhao Duan, Lichao Sun, Hao Cheng, Renjing Xu, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu Action Detection via an Image Diffusion Process
Lin Geng Foo, Tianjiao Li, Hossein Rahmani, Jun Liu Action Scene Graphs for Long-Form Understanding of Egocentric Videos
Ivan Rodin, Antonino Furnari, Kyle Min, Subarna Tripathi, Giovanni Maria Farinella Active Generalized Category Discovery
Shijie Ma, Fei Zhu, Zhun Zhong, Xu-Yao Zhang, Cheng-Lin Liu ActiveDC: Distribution Calibration for Active Finetuning
Wenshuai Xu, Zhenghui Hu, Yu Lu, Jinzhou Meng, Qingjie Liu, Yunhong Wang Adapters Strike Back
Jan-Martin O. Steitz, Stefan Roth Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
Ke Fan, Zechen Bai, Tianjun Xiao, Tong He, Max Horn, Yanwei Fu, Francesco Locatello, Zheng Zhang Advancing Saliency Ranking with Human Fixations: Dataset Models and Benchmarks
Bowen Deng, Siyang Song, Andrew P. French, Denis Schluppeck, Michael P. Pound Adversarial Text to Continuous Image Generation
Kilichbek Haydarov, Aashiq Muhamed, Xiaoqian Shen, Jovana Lazarevic, Ivan Skorokhodov, Chamuditha Jayanga Galappaththige, Mohamed Elhoseiny Affine Equivariant Networks Based on Differential Invariants
Yikang Li, Yeqing Qiu, Yuxuan Chen, Lingshen He, Zhouchen Lin AHIVE: Anatomy-Aware Hierarchical Vision Encoding for Interactive Radiology Report Retrieval
Sixing Yan, William K. Cheung, Ivor W. Tsang, Keith Chiu, Terence M. Tong, Ka Chun Cheung, Simon See AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving
Mingfu Liang, Jong-Chyi Su, Samuel Schulter, Sparsh Garg, Shiyu Zhao, Ying Wu, Manmohan Chandraker AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation
Qingping Sun, Yanjun Wang, Ailing Zeng, Wanqi Yin, Chen Wei, Wenjia Wang, Haiyi Mei, Chi-Sing Leung, Ziwei Liu, Lei Yang, Zhongang Cai AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings
Jamie Watson, Filippo Aleotti, Mohamed Sayed, Zawar Qureshi, Oisin Mac Aodha, Gabriel Brostow, Michael Firman, Sara Vicente Alchemist: Parametric Control of Material Properties with Diffusion Models
Prafull Sharma, Varun Jampani, Yuanzhen Li, Xuhui Jia, Dmitry Lagun, Fredo Durand, Bill Freeman, Mark Matthews Aligning and Prompting Everything All at Once for Universal Visual Perception
Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, Shaohui Lin, Rongrong Ji AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis
Tang Tao, Guangrun Wang, Yixing Lao, Peng Chen, Jie Liu, Liang Lin, Kaicheng Yu, Xiaodan Liang Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang Amodal Ground Truth and Completion in the Wild
Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman An Aggregation-Free Federated Learning for Tackling Data Heterogeneity
Yuan Wang, Huazhu Fu, Renuga Kanagavelu, Qingsong Wei, Yong Liu, Rick Siow Mong Goh An Interactive Navigation Method with Effect-Oriented Affordance
Xiaohan Wang, Yuehu Liu, Xinhang Song, Yuyi Liu, Sixian Zhang, Shuqiang Jiang Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, Samuli Laine Anchor-Based Robust Finetuning of Vision-Language Models
Jinwei Han, Zhiwen Lin, Zhongyisun Sun, Yingguo Gao, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia ANIM: Accurate Neural Implicit Model for Human Reconstruction from a Single RGB-D Image
Marco Pesavento, Yuanlu Xu, Nikolaos Sarafianos, Robert Maier, Ziyan Wang, Chun-Han Yao, Marco Volino, Edmond Boyer, Adrian Hilton, Tony Tung Any-Shift Prompting for Generalization over Distributions
Zehao Xiao, Jiayi Shen, Mohammad Mahdi Derakhshani, Shengcai Liao, Cees G. M. Snoek AnyDoor: Zero-Shot Object-Level Image Customization
Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao AnyScene: Customized Image Synthesis with Composited Foreground
Ruidong Chen, Lanjun Wang, Weizhi Nie, Yongdong Zhang, An-An Liu AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
Jieming Cui, Tengyu Liu, Nian Liu, Yaodong Yang, Yixin Zhu, Siyuan Huang APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation
Weizhao He, Yang Zhang, Wei Zhuo, Linlin Shen, Jiaqi Yang, Songhe Deng, Liang Sun Arbitrary Motion Style Transfer with Multi-Condition Motion Latent Diffusion Model
Wenfeng Song, Xingliang Jin, Shuai Li, Chenglizhao Chen, Aimin Hao, Xia Hou, Ning Li, Hong Qin Artist-Friendly Relightable and Animatable Neural Heads
Yingyan Xu, Prashanth Chandran, Sebastian Weiss, Markus Gross, Gaspard Zoss, Derek Bradley ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering
Haokai Pang, Heming Zhu, Adam Kortylewski, Christian Theobalt, Marc Habermann AssistGUI: Task-Oriented PC Graphical User Interface Automation
Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
Hongjie Wang, Difan Liu, Yan Kang, Yijun Li, Zhe Lin, Niraj K. Jha, Yuchen Liu Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu, Yikun Liu, Fei Zhang, Chen Ju, Ya Zhang, Yanfeng Wang Authentic Hand Avatar from a Phone Scan via Universal Hand Model
Gyeongsik Moon, Weipeng Xu, Rohan Joshi, Chenglei Wu, Takaaki Shiratori Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Hao Li, Xue Yang, Zhaokai Wang, Xizhou Zhu, Jie Zhou, Yu Qiao, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch
Xidong Wu, Shangqian Gao, Zeyu Zhang, Zhenzhen Li, Runxue Bao, Yanfu Zhang, Xiaoqian Wang, Heng Huang AutoAD III: The Prequel - Back to the Pixels
Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers
Jinxia Xie, Bineng Zhong, Zhiyi Mo, Shengping Zhang, Liangtao Shi, Shuxiang Song, Rongrong Ji AV-RIR: Audio-Visual Room Impulse Response Estimation
Anton Ratnarajah, Sreyan Ghosh, Sonal Kumar, Purva Chiniya, Dinesh Manocha AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
Trevine Oorloff, Surya Koppisetti, Nicolò Bonettini, Divyaraj Solanki, Ben Colman, Yaser Yacoob, Ali Shahriyari, Gaurav Bharaj AVID: Any-Length Video Inpainting with Diffusion Model
Zhixing Zhang, Bichen Wu, Xiaoyan Wang, Yaqiao Luo, Luxin Zhang, Yinan Zhao, Peter Vajda, Dimitris Metaxas, Licheng Yu BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
Yiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu, Lizhuang Ma Backpropagation-Free Network for 3D Test-Time Adaptation
Yanshuo Wang, Ali Cheraghian, Zeeshan Hayder, Jie Hong, Sameera Ramasinghe, Shafin Rahman, David Ahmedt-Aristizabal, Xuesong Li, Lars Petersson, Mehrtash Harandi BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
Jiawang Bai, Kuofeng Gao, Shaobo Min, Shu-Tao Xia, Zhifeng Li, Wei Liu Balancing Act: Distribution-Guided Debiasing in Diffusion Models
Rishubh Parihar, Abhijnya Bhat, Abhipsa Basu, Saswat Mallick, Jogendra Nath Kundu, R. Venkatesh Babu BANF: Band-Limited Neural Fields for Levels of Detail Reconstruction
Akhmedkhan Shabanov, Shrisudhan Govindarajan, Cody Reading, Lily Goli, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields
Lily Goli, Cody Reading, Silvia Sellán, Alec Jacobson, Andrea Tagliasacchi Bayesian Diffusion Models for 3D Shape Reconstruction
Haiyang Xu, Yu Lei, Zeyuan Chen, Xiang Zhang, Yue Zhao, Yilin Wang, Zhuowen Tu BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Yunhao Ge, Yihe Tang, Jiashu Xu, Cem Gokmen, Chengshu Li, Wensi Ai, Benjamin Jose Martinez, Arman Aydin, Mona Anvari, Ayush K Chakravarthy, Hong-Xing Yu, Josiah Wong, Sanjana Srivastava, Sharon Lee, Shengxin Zha, Laurent Itti, Yunzhu Li, Roberto Martín-Martín, Miao Liu, Pengchuan Zhang, Ruohan Zhang, Li Fei-Fei, Jiajun Wu Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos
Chen Liu, Peike Patrick Li, Qingtao Yu, Hongwei Sheng, Dadong Wang, Lincheng Li, Xin Yu BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-Based Roadside 3D Object Detection
Wenjie Wang, Yehao Lu, Guangcong Zheng, Shuigen Zhan, Xiaoqing Ye, Zichang Tan, Jingdong Wang, Gaoang Wang, Xi Li Beyond First-Order Tweedie: Solving Inverse Problems Using Latent Diffusion
Litu Rout, Yujia Chen, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs
Hugh Blayney, Hanlin Tian, Hamish Scott, Nils Goldbeck, Chess Stetson, Panagiotis Angeloudis Bidirectional Autoregessive Diffusion Model for Dance Generation
Canyu Zhang, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Mei Han, Jing Xiao, Song Wang Bilateral Event Mining and Complementary for Event Stream Super-Resolution
Zhilin Huang, Quanmin Liang, Yijie Yu, Chujun Qin, Xiawu Zheng, Kai Huang, Zikun Zhou, Wenming Yang Bilateral Propagation Network for Depth Completion
Jie Tang, Fei-Peng Tian, Boshi An, Jian Li, Ping Tan Binarized Low-Light Raw Video Enhancement
Gengchen Zhang, Yulun Zhang, Xin Yuan, Ying Fu Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang, Chao Feng, Ziyang Chen, Hyoungseob Park, Daniel Wang, Yiming Dou, Ziyao Zeng, Xien Chen, Rit Gangopadhyay, Andrew Owens, Alex Wong BioCLIP: A Vision Foundation Model for the Tree of Life
Samuel Stevens, Jiaman Wu, Matthew J Thompson, Elizabeth G Campolongo, Chan Hee Song, David Edward Carlyn, Li Dong, Wasila M Dahdul, Charles Stewart, Tanya Berger-Wolf, Wei-Lun Chao, Yu Su BiPer: Binary Neural Networks Using a Periodic Function
Edwin Vargas, Claudia V. Correa, Carlos Hinojosa, Henry Arguello Boosting Flow-Based Generative Super-Resolution Models via Learned Prior
Li-Yuan Tsao, Yi-Chen Lo, Chia-Che Chang, Hao-Wei Chen, Roy Tseng, Chien Feng, Chun-Yi Lee Boosting Neural Representations for Videos with a Conditional Decoder
Xinjie Zhang, Ren Yang, Dailan He, Xingtong Ge, Tongda Xu, Yan Wang, Hongwei Qin, Jun Zhang Bootstrapping Autonomous Driving Radars with Self-Supervised Learning
Yiduo Hao, Sohrab Madani, Junfeng Guan, Mohammed Alloulah, Saurabh Gupta, Haitham Hassanieh Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-Ray Expert Models
Weiwei Cao, Jianpeng Zhang, Yingda Xia, Tony C. W. Mok, Zi Li, Xianghua Ye, Le Lu, Jian Zheng, Yuxing Tang, Ling Zhang Bootstrapping SparseFormers from Vision Foundation Models
Ziteng Gao, Zhan Tong, Kevin Qinghong Lin, Joya Chen, Mike Zheng Shou BoQ: A Place Is Worth a Bag of Learnable Queries
Amar Ali-bey, Brahim Chaib-draa, Philippe Giguère BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics
Wenqian Zhang, Molin Huang, Yuxuan Zhou, Juze Zhang, Jingyi Yu, Jingya Wang, Lan Xu Brain Decodes Deep Nets
Huzheng Yang, James Gee, Jianbo Shi BrainWash: A Poisoning Attack to Forget in Continual Learning
Ali Abbasi, Parsa Nooralinejad, Hamed Pirsiavash, Soheil Kolouri Breathing Life into Sketches Using Text-to-Video Priors
Rinon Gal, Yael Vinker, Yuval Alaluf, Amit Bermano, Daniel Cohen-Or, Ariel Shamir, Gal Chechik Bridging the Gap Between End-to-End and Two-Step Text Spotting
Mingxin Huang, Hongliang Li, Yuliang Liu, Xiang Bai, Lianwen Jin Brush2Prompt: Contextual Prompt Generator for Object Inpainting
Mang Tik Chiu, Yuqian Zhou, Lingzhi Zhang, Zhe Lin, Connelly Barnes, Sohrab Amirghodsi, Eli Shechtman, Humphrey Shi Building Optimal Neural Architectures Using Interpretable Knowledge
Keith G. Mills, Fred X. Han, Mohammad Salameh, Shengyao Lu, Chunhua Zhou, Jiao He, Fengyu Sun, Di Niu Cache Me if You Can: Accelerating Diffusion Models Through Block Caching
Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, Christian Rupprecht, Daniel Cremers, Peter Vajda, Jialiang Wang CAD: Photorealistic 3D Generation via Adversarial Distillation
Ziyu Wan, Despoina Paschalidou, Ian Huang, Hongyu Liu, Bokui Shen, Xiaoyu Xiang, Jing Liao, Leonidas Guibas CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs
Haocheng Yuan, Jing Xu, Hao Pan, Adrien Bousseau, Niloy J. Mitra, Changjian Li CAGE: Controllable Articulation GEneration
Jiayi Liu, Hou In Ivan Tam, Ali Mahdavi-Amiri, Manolis Savva Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
Junyi Ma, Xieyuanli Chen, Jiawei Huang, Jingyi Xu, Zhen Luo, Jintao Xu, Weihao Gu, Rui Ai, Hesheng Wang CAMixerSR: Only Details Need More "Attention"
Yan Wang, Yi Liu, Shijie Zhao, Junlin Li, Li Zhang CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation
Townim Faisal Chowdhury, Kewen Liao, Vu Minh Hieu Phan, Minh-Son To, Yutong Xie, Kevin Hung, David Ross, Anton van den Hengel, Johan W. Verjans, Zhibin Liao CapHuman: Capture Your Moments in Parallel Universes
Chao Liang, Fan Ma, Linchao Zhu, Yingying Deng, Yi Yang CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu, Quan Sun, Xiaosong Zhang, Yufeng Cui, Fan Zhang, Yue Cao, Xinlong Wang, Jingjing Liu Capturing Closely Interacted Two-Person Motions with Reaction Priors
Qi Fang, Yinghui Fan, Yanjun Li, Junting Dong, Dingwei Wu, Weidong Zhang, Kang Chen Carve3D: Improving Multi-View Reconstruction Consistency for Diffusion Models with RL Finetuning
Desai Xie, Jiahao Li, Hao Tan, Xin Sun, Zhixin Shu, Yi Zhou, Sai Bi, Sören Pirk, Arie E. Kaufman CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification
Haoran Lai, Qingsong Yao, Zihang Jiang, Rongsheng Wang, Zhiyang He, Xiaodong Tao, S. Kevin Zhou CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model
Jianhao Zeng, Dan Song, Weizhi Nie, Hongshuo Tian, Tongtong Wang, An-An Liu CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Seokju Cho, Heeseong Shin, Sunghwan Hong, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim Category-Level Multi-Part Multi-Joint 3D Shape Assembly
Yichen Li, Kaichun Mo, Yueqi Duan, He Wang, Jiequan Zhang, Lin Shao CCEdit: Creative and Controllable Video Editing via Diffusion Models
Ruoyu Feng, Wenming Weng, Yanhui Wang, Yuhui Yuan, Jianmin Bao, Chong Luo, Zhibo Chen, Baining Guo CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-Spoofing
Ajian Liu, Shuai Xue, Jianwen Gan, Jun Wan, Yanyan Liang, Jiankang Deng, Sergio Escalera, Zhen Lei ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images
Nicolas Bourriez, Ihab Bendidi, Ethan Cohen, Gabriel Watkinson, Maxime Sanchez, Guillaume Bollot, Auguste Genovesio ChatPose: Chatting About 3D Human Pose
Yao Feng, Jing Lin, Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Michael J. Black Class Incremental Learning with Multi-Teacher Distillation
Haitao Wen, Lili Pan, Yu Dai, Heqian Qiu, Lanxiao Wang, Qingbo Wu, Hongliang Li CLiC: Concept Learning in Context
Mehdi Safaee, Aryan Mikaeili, Or Patashnik, Daniel Cohen-Or, Ali Mahdavi-Amiri CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang, Zhulin An, Libo Huang, Junyu Bi, Xinqiang Yu, Han Yang, Boyu Diao, Yongjun Xu CLOAF: CoLlisiOn-Aware Human Flow
Andrey Davydov, Martin Engilberge, Mathieu Salzmann, Pascal Fua Clockwork Diffusion: Efficient Generation with Model-Step Distillation
Amirhossein Habibian, Amir Ghodrati, Noor Fathima, Guillaume Sautiere, Risheek Garrepalli, Fatih Porikli, Jens Petersen Cloud-Device Collaborative Learning for Multimodal Large Language Models
Guanqun Wang, Jiaming Liu, Chenxuan Li, Yuan Zhang, Junpeng Ma, Xinyu Wei, Kevin Zhang, Maurice Chong, Renrui Zhang, Yijiang Liu, Shanghang Zhang CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update
Zhi Gao, Yuntao Du, Xintong Zhang, Xiaojian Ma, Wenjuan Han, Song-Chun Zhu, Qing Li Clustering for Protein Representation Learning
Ruijie Quan, Wenguan Wang, Fan Ma, Hehe Fan, Yi Yang CNC-Net: Self-Supervised Learning for CNC Machining Operations
Mohsen Yavartanoo, Sangmin Hong, Reyhaneh Neshatavar, Kyoung Mu Lee Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu COCONut: Modernizing COCO Segmentation
Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, Liang-Chieh Chen Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling
Baoquan Zhang, Huaibin Wang, Chuyao Luo, Xutao Li, Guotao Liang, Yunming Ye, Xiaochen Qi, Yao He CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras
Sachin Shah, Matthew A. Chan, Haoming Cai, Jingxi Chen, Sakshum Kulshrestha, Chahat Deep Singh, Yiannis Aloimonos, Christopher A. Metzler CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Hao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen, Yujun Shen CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation
Zineng Tang, Ziyi Yang, Mahmoud Khademi, Yang Liu, Chenguang Zhu, Mohit Bansal CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
Kangfu Mei, Mauricio Delbracio, Hossein Talebi, Zhengzhong Tu, Vishal M. Patel, Peyman Milanfar CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding, Jie Tang CoGS: Controllable Gaussian Splatting
Heng Yu, Joel Julin, Zoltán A. Milacski, Koichiro Niinuma, László A. Jeni Collaborating Foundation Models for Domain Generalized Semantic Segmentation
Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles
Rui Song, Chenwei Liang, Hu Cao, Zhiran Yan, Walter Zimmer, Markus Gross, Andreas Festag, Alois Knoll COLMAP-Free 3D Gaussian Splatting
Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A. Efros, Xiaolong Wang CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images
Aaron Gokaslan, A. Feder Cooper, Jasmine Collins, Landan Seguin, Austin Jacobson, Mihir Patel, Jonathan Frankle, Cory Stephenson, Volodymyr Kuleshov Compact 3D Gaussian Representation for Radiance Field
Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, Eunbyung Park Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction
Jianping Jiang, Xinyu Zhou, Bingxuan Wang, Xiaoming Deng, Chao Xu, Boxin Shi Composed Video Retrieval via Enriched Context and Discriminative Embeddings
Omkar Thawakar, Muzammal Naseer, Rao Muhammad Anwer, Salman Khan, Michael Felsberg, Mubarak Shah, Fahad Shahbaz Khan Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models
Gihyun Kwon, Simon Jenni, Dingzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron ConCon-Chi: Concept-Context Chimera Benchmark for Personalized Vision-Language Tasks
Andrea Rosasco, Stefano Berti, Giulia Pasquale, Damiano Malafronte, Shogo Sato, Hiroyuki Segawa, Tetsugo Inada, Lorenzo Natale ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing
Jun-Kun Chen, Samuel Rota Bulò, Norman Müller, Lorenzo Porzi, Peter Kontschieder, Yu-Xiong Wang Constrained Layout Generation with Factor Graphs
Mohammed Haroon Dupty, Yanfei Dong, Sicong Leng, Guoji Fu, Yong Liang Goh, Wei Lu, Wee Sun Lee Context-Guided Spatio-Temporal Video Grounding
Xin Gu, Heng Fan, Yan Huang, Tiejian Luo, Libo Zhang Contextrast: Contextual Contrastive Learning for Semantic Segmentation
Changki Sung, Wanhee Kim, Jungho An, Wooju Lee, Hyungtae Lim, Hyun Myung Continual Forgetting for Pre-Trained Vision Models
Hongbo Zhao, Bolin Ni, Junsong Fan, Yuxi Wang, Yuntao Chen, Gaofeng Meng, Zhaoxiang Zhang Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Jiaming Liu, Ran Xu, Senqiao Yang, Renrui Zhang, Qizhe Zhang, Zehui Chen, Yandong Guo, Shanghang Zhang Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment
Ziyu Shan, Yujie Zhang, Qi Yang, Haichen Yang, Yiling Xu, Jenq-Neng Hwang, Xiaozhong Xu, Shan Liu Control4D: Efficient 4D Portrait Editing with Text
Ruizhi Shao, Jingxiang Sun, Cheng Peng, Zerong Zheng, Boyao Zhou, Hongwen Zhang, Yebin Liu ControlRoom3D: Room Generation Using Semantic Proxy Rooms
Jonas Schult, Sam Tsai, Lukas Höllein, Bichen Wu, Jialiang Wang, Chih-Yao Ma, Kunpeng Li, Xiaofang Wang, Felix Wimbauer, Zijian He, Peizhao Zhang, Bastian Leibe, Peter Vajda, Ji Hou ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis
Muhammad Hamza Mughal, Rishabh Dabral, Ikhsanul Habibie, Lucia Donatelli, Marc Habermann, Christian Theobalt Convolutional Prompting Meets Language Models for Continual Learning
Anurag Roy, Riddhiman Moulick, Vinay K. Verma, Saptarshi Ghosh, Abir Das CoralSCOP: Segment Any COral Image on This Planet
Ziqiang Zheng, Haixin Liang, Binh-Son Hua, Yue Him Wong, Put Ang Jr, Apple Pui Yi Chui, Sai-Kit Yeung CORES: Convolutional Response-Based Score for Out-of-Distribution Detection
Keke Tang, Chao Hou, Weilong Peng, Runnan Chen, Peican Zhu, Wenping Wang, Zhihong Tian Correcting Diffusion Generation Through Resampling
Yujian Liu, Yang Zhang, Tommi Jaakkola, Shiyu Chang Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities
Mingcheng Li, Dingkang Yang, Xiao Zhao, Shuaibing Wang, Yan Wang, Kun Yang, Mingyang Sun, Dongliang Kou, Ziyun Qian, Lihua Zhang CoSeR: Bridging Image and Language for Cognitive Super-Resolution
Haoze Sun, Wenbo Li, Jianzhuang Liu, Haoyu Chen, Renjing Pei, Xueyi Zou, Youliang Yan, Yujiu Yang CosmicMan: A Text-to-Image Foundation Model for Humans
Shikai Li, Jianglin Fu, Kaiyuan Liu, Wentao Wang, Kwan-Yee Lin, Wayne Wu Coupled Laplacian Eigenmaps for Locally-Aware 3D Rigid Point Cloud Matching
Matteo Bastico, Etienne Decencière, Laurent Corté, Yannick Tillier, David Ryckelynck CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
Sajid Javed, Arif Mahmood, Iyyakutti Iyappan Ganapathi, Fayaz Ali Dharejo, Naoufel Werghi, Mohammed Bennamoun CPR-Coach: Recognizing Composite Error Actions Based on Single-Class Training
Shunli Wang, Shuaibing Wang, Dingkang Yang, Mingcheng Li, Haopeng Kuang, Xiao Zhao, Liuzhen Su, Peng Zhai, Lihua Zhang CPR: Retrieval Augmented Generation for Copyright Protection
Aditya Golatkar, Alessandro Achille, Luca Zancato, Yu-Xiang Wang, Ashwin Swaminathan, Stefano Soatto Cross-Dimension Affinity Distillation for 3D EM Neuron Segmentation
Xiaoyu Liu, Miaomiao Cai, Yinda Chen, Yueyi Zhang, Te Shi, Ruobing Zhang, Xuejin Chen, Zhiwei Xiong Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining
Jiahao Nie, Yun Xing, Gongjie Zhang, Pei Yan, Aoran Xiao, Yap-Peng Tan, Alex C. Kot, Shijian Lu Cross-Spectral Gated-RGB Stereo Depth Estimation
Samuel Brucker, Stefanie Walz, Mario Bijelic, Felix Heide Cross-View and Cross-Pose Completion for 3D Human Understanding
Matthieu Armando, Salma Galaaoui, Fabien Baradel, Thomas Lucas, Vincent Leroy, Romain Brégier, Philippe Weinzaepfel, Grégory Rogez CrossKD: Cross-Head Knowledge Distillation for Object Detection
Jiabao Wang, Yuming Chen, Zhaohui Zheng, Xiang Li, Ming-Ming Cheng, Qibin Hou CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training
Yuxin Guo, Siyang Sun, Shuailei Ma, Kecheng Zheng, Xiaoyi Bao, Shijie Ma, Wei Zou, Yun Zheng CrowdDiff: Multi-Hypothesis Crowd Density Estimation Using Diffusion Models
Yasiru Ranasinghe, Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara, Vishal M. Patel CurveCloudNet: Processing Point Clouds with 1d Structure
Colton Stearns, Alex Fu, Jiateng Liu, Jeong Joon Park, Davis Rempe, Despoina Paschalidou, Leonidas J. Guibas Customize Your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training
Runze He, Shaofei Huang, Xuecheng Nie, Tianrui Hui, Luoqi Liu, Jiao Dai, Jizhong Han, Guanbin Li, Si Liu CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
Jun Wang, Yuzhe Qin, Kaiming Kuang, Yigit Korkmaz, Akhilan Gurumoorthy, Hao Su, Xiaolong Wang CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data
Wei Fang, Yuxing Tang, Heng Guo, Mingze Yuan, Tony C. W. Mok, Ke Yan, Jiawen Yao, Xin Chen, Zaiyi Liu, Le Lu, Ling Zhang, Minfeng Xu D3still: Decoupled Differential Distillation for Asymmetric Image Retrieval
Yi Xie, Yihong Lin, Wenjie Cai, Xuemiao Xu, Huaidong Zhang, Yong Du, Shengfeng He DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance
Zixuan Wang, Jia Jia, Shikun Sun, Haozhe Wu, Rong Han, Zhenyu Li, Di Tang, Jiaqing Zhou, Jiebo Luo DAP: A Dynamic Adversarial Patch for Evading Person Detectors
Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif, Ihsen Alouani, Muhammad Shafique DaReNeRF: Direction-Aware Representation for Dynamic Scenes
Ange Lou, Benjamin Planche, Zhongpai Gao, Yamin Li, Tianyu Luan, Hao Ding, Terrence Chen, Jack Noble, Ziyan Wu DART: Implicit Doppler Tomography for Radar Novel View Synthesis
Tianshu Huang, John Miller, Akarsh Prabhakara, Tao Jin, Tarana Laroia, Zico Kolter, Anthony Rowe Data Poisoning Based Backdoor Attacks to Contrastive Learning
Jinghuai Zhang, Hongbin Liu, Jinyuan Jia, Neil Zhenqiang Gong Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis, Zhaoyan Liu, Satya Krishna Gorti, Valentin Villecroze, Jesse C. Cresswell, Guangwei Yu, Gabriel Loaiza-Ganem, Maksims Volkovs DAVE - A Detect-and-Verify Paradigm for Low-Shot Counting
Jer Pelhan, Alan Lukeži?, Vitjan Zavrtanik, Matej Kristan Day-Night Cross-Domain Vehicle Re-Identification
Hongchao Li, Jingong Chen, Aihua Zheng, Yong Wu, Yonglong Luo De-Confounded Data-Free Knowledge Distillation for Handling Distribution Shifts
Yuzheng Wang, Dingkang Yang, Zhaoyu Chen, Yang Liu, Siao Liu, Wenqiang Zhang, Lihua Zhang, Lizhe Qi De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei, Chenxi Liu, Siyuan Qiao, Zhishuai Zhang, Alan Yuille, Jiahui Yu DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
Tianhao Qi, Shancheng Fang, Yanze Wu, Hongtao Xie, Jiawei Liu, Lang Chen, Qian He, Yongdong Zhang Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation
Fahimeh Hosseini Noohdani, Parsa Hosseini, Aryan Yazdan Parast, Hamidreza Yaghoubi Araghi, Mahdieh Soleymani Baghshah Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-Training Framework
Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi, Lingqiao Liu, Liyang Liu, Bowen Zhang, Zhibin Liao, Qi Wu, Minh-Son To, Johan W. Verjans Decoupled Pseudo-Labeling for Semi-Supervised Monocular 3D Object Detection
Jiacheng Zhang, Jiaming Li, Xiangru Lin, Wei Zhang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li Deep Equilibrium Diffusion Restoration with Parallel Sampling
Jiezhang Cao, Yue Shi, Kai Zhang, Yulun Zhang, Radu Timofte, Luc Van Gool Deep Video Inverse Tone Mapping Based on Temporal Clues
Yuyao Ye, Ning Zhang, Yang Zhao, Hongbin Cao, Ronggang Wang DemoCaricature: Democratising Caricature Generation with a Rough Sketch
Dar-Yen Chen, Ayan Kumar Bhunia, Subhadeep Koley, Aneeshan Sain, Pinaki Nath Chowdhury, Yi-Zhe Song DemoFusion: Democratising High-Resolution Image Generation with No $$$
Ruoyi Du, Dongliang Chang, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma Dense Optical Tracking: Connecting the Dots
Guillaume Le Moing, Jean Ponce, Cordelia Schmid DePT: Decoupled Prompt Tuning
Ji Zhang, Shihan Wu, Lianli Gao, Heng Tao Shen, Jingkuan Song Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao Depth Prompting for Sensor-Agnostic Depth Estimation
Jin-Hwi Park, Chanhwi Jeong, Junoh Lee, Hae-Gon Jeon Depth-Aware Test-Time Training for Zero-Shot Video Object Segmentation
Weihuang Liu, Xi Shen, Haolun Li, Xiuli Bi, Bo Liu, Chi-Man Pun, Xiaodong Cun Describing Differences in Image Sets with Natural Language
Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy Desigen: A Pipeline for Controllable Design Template Generation
Haohan Weng, Danqing Huang, Yu Qiao, Zheng Hu, Chin-Yew Lin, Tong Zhang, C. L. Philip Chen Design2Cloth: 3D Cloth Generation from 2D Masks
Jiali Zheng, Rolandos Alexandros Potamias, Stefanos Zafeiriou DetCLIPv3: Towards Versatile Generative Open-Vocabulary Object Detection
Lewei Yao, Renjie Pi, Jianhua Han, Xiaodan Liang, Hang Xu, Wei Zhang, Zhenguo Li, Dan Xu DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
Yibo Wang, Ruiyuan Gao, Kai Chen, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit-Yan Yeung, Qiang Xu, Kai Zhang Detector-Free Structure from Motion
Xingyi He, Jiaming Sun, Yifan Wang, Sida Peng, Qixing Huang, Hujun Bao, Xiaowei Zhou Detours for Navigating Instructional Videos
Kumar Ashutosh, Zihui Xue, Tushar Nagarajan, Kristen Grauman DETRs Beat YOLOs on Real-Time Object Detection
Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, Jie Chen Device-Wise Federated Network Pruning
Shangqian Gao, Junyi Li, Zeyu Zhang, Yanfu Zhang, Weidong Cai, Heng Huang Dexterous Grasp Transformer
Guo-Hao Xu, Yi-Lin Wei, Dian Zheng, Xiao-Ming Wu, Wei-Shi Zheng DIEM: Decomposition-Integration Enhancing Multimodal Insights
Xinyi Jiang, Guoming Wang, Junhao Guo, Juncheng Li, Wenqiao Zhang, Rongxing Lu, Siliang Tang DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
Lirui Zhao, Yue Yang, Kaipeng Zhang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Rongrong Ji DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliari, Pietro Moreiro, Alessio Del Bue DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation
Yifei Li, Hsiao-yu Chen, Egor Larionov, Nikolaos Sarafianos, Wojciech Matusik, Tuur Stuyck DiffCast: A Unified Framework via Residual Diffusion for Precipitation Nowcasting
Demin Yu, Xutao Li, Yunming Ye, Baoquan Zhang, Chuyao Luo, Kuai Dai, Rui Wang, Xunlai Chen Differentiable Display Photometric Stereo
Seokjun Choi, Seungwoo Yoon, Giljoo Nam, Seungyong Lee, Seung-Hwan Baek Differentiable Micro-Mesh Construction
Yishun Dou, Zhong Zheng, Qiaoqiao Jin, Rui Shi, Yuhan Li, Bingbing Ni Differentiable Neural Surface Refinement for Modeling Transparent Objects
Weijian Deng, Dylan Campbell, Chunyi Sun, Shubham Kanitkar, Matthew E. Shaffer, Stephen Gould Differentiable Point-Based Inverse Rendering
Hoon-Gyu Chung, Seokjun Choi, Seung-Hwan Baek DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans
Akash Sengupta, Thiemo Alldieck, Nikos Kolotouros, Enric Corona, Andrei Zanfir, Cristian Sminchisescu DiffInDScene: Diffusion-Based High-Quality 3D Indoor Scene Generation
Xiaoliang Ju, Zhaoyang Huang, Yijin Li, Guofeng Zhang, Yu Qiao, Hongsheng Li DiffLoc: Diffusion Model for Outdoor LiDAR Localization
Wen Li, Yuyang Yang, Shangshu Yu, Guosheng Hu, Chenglu Wen, Ming Cheng, Cheng Wang DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement
Jiuming Liu, Guangming Wang, Weicai Ye, Chaokang Jiang, Jinru Han, Zhe Liu, Guofeng Zhang, Dalong Du, Hesheng Wang DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis
Yuming Gu, Hongyi Xu, You Xie, Guoxian Song, Yichun Shi, Di Chang, Jing Yang, Linjie Luo DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis
Jiapeng Tang, Yinyu Nie, Lev Markhasin, Angela Dai, Justus Thies, Matthias Nießner DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models
Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood, Karthik Nandakumar Diffusion Handles Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D
Karran Pandey, Paul Guerrero, Matheus Gadelha, Yannick Hold-Geoffroy, Karan Singh, Niloy J. Mitra Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik Diffusion Models Without Attention
Jing Nathan Yan, Jiatao Gu, Alexander M. Rush Diffusion Time-Step Curriculum for One Image to 3D Generation
Xuanyu Yi, Zike Wu, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Hanwang Zhang Diffusion-Based Blind Text Image Super-Resolution
Yuzhe Zhang, Jiawei Zhang, Hao Li, Zhouxia Wang, Luwei Hou, Dongqing Zou, Liheng Bian Diffusion-EDFs: Bi-Equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation
Hyunwoo Ryu, Jiwoo Kim, Hyunseok An, Junwoo Chang, Joohwan Seo, Taehan Kim, Yubin Kim, Chaewon Hwang, Jongeun Choi, Roberto Horowitz Diffusion-ES: Gradient-Free Planning with Diffusion for Autonomous and Instruction-Guided Driving
Brian Yang, Huangyuan Su, Nikolaos Gkanatsios, Tsung-Wei Ke, Ayush Jain, Jeff Schneider, Katerina Fragkiadaki DiffusionLight: Light Probes for Free by Painting a Chrome Ball
Pakkapon Phongthawee, Worameth Chinchuthakun, Nontaphat Sinsunthithet, Varun Jampani, Amit Raj, Pramook Khungurn, Supasorn Suwajanakorn Digital Life Project: Autonomous 3D Characters with Social Intelligence
Zhongang Cai, Jianping Jiang, Zhongfei Qing, Xinying Guo, Mingyuan Zhang, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, Liang Pan, Xiangyu Fan, Han Du, Peng Gao, Zhitao Yang, Yang Gao, Jiaqi Li, Tianxiang Ren, Yukun Wei, Xiaogang Wang, Chen Change Loy, Lei Yang, Ziwei Liu DIMAT: Decentralized Iterative Merging-and-Training for Deep Learning Models
Nastaran Saadati, Minh Pham, Nasla Saleem, Joshua R. Waite, Aditya Balu, Zhanong Jiang, Chinmay Hegde, Soumik Sarkar DIOD: Self-Distillation Meets Object Discovery
Sandra Kara, Hejer Ammar, Julien Denize, Florian Chabot, Quoc-Cuong Pham Direct2.5: Diverse Text-to-3D Generation via Multi-View 2.5d Diffusion
Yuanxun Lu, Jingyang Zhang, Shiwei Li, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan, Xun Cao, Yao Yao DisCo: Disentangled Control for Realistic Human Dance Generation
Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang Discovering and Mitigating Visual Biases Through Keyword Explanation
Younghyun Kim, Sangwoo Mo, Minkyu Kim, Kyungmin Lee, Jaeho Lee, Jinwoo Shin Discovering Syntactic Interaction Clues for Human-Object Interaction Detection
Jinguo Luo, Weihong Ren, Weibo Jiang, Xi'ai Chen, Qiang Wang, Zhi Han, Honghai Liu Discriminative Probing and Tuning for Text-to-Image Generation
Leigang Qu, Wenjie Wang, Yongqi Li, Hanwang Zhang, Liqiang Nie, Tat-Seng Chua Disentangled Prompt Representation for Domain Generalization
De Cheng, Zhipeng Xu, Xinyang Jiang, Nannan Wang, Dongsheng Li, Xinbo Gao Dispersed Structured Light for Hyperspectral 3D Imaging
Suhyun Shin, Seokjun Choi, Felix Heide, Seung-Hwan Baek Distilling Semantic Priors from SAM to Efficient Image Restoration Models
Quan Zhang, Xiaoyu Liu, Wei Li, Hanting Chen, Junchao Liu, Jie Hu, Zhiwei Xiong, Chun Yuan, Yunhe Wang Distilling Vision-Language Models on Millions of Videos
Yue Zhao, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, Hartwig Adam, Ting Liu, Boqing Gong, Philipp Krahenbuhl, Liangzhe Yuan Distributionally Generative Augmentation for Fair Facial Attribute Classification
Fengda Zhang, Qianpei He, Kun Kuang, Jiashuo Liu, Long Chen, Chao Wu, Jun Xiao, Hanwang Zhang DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Muyang Li, Tianle Cai, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Kai Li, Song Han DiVa-360: The Dynamic Visual Dataset for Immersive Neural Fields
Cheng-You Lu, Peisen Zhou, Angela Xing, Chandradeep Pokhariya, Arnab Dey, Ishaan Nikhil Shah, Rugved Mavidipalli, Dylan Hu, Andrew I. Comport, Kefan Chen, Srinath Sridhar DiVAS: Video and Audio Synchronization with Dynamic Frame Rates
Clara Fernandez-Labrador, Mertcan Akçay, Eitan Abecassis, Joan Massich, Christopher Schroers Diversified and Personalized Multi-Rater Medical Image Segmentation
Yicheng Wu, Xiangde Luo, Zhe Xu, Xiaoqing Guo, Lie Ju, Zongyuan Ge, Wenjun Liao, Jianfei Cai DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-Based 3D Vision
Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, Xuanmao Li, Xingpeng Sun, Rohan Ashok, Aniruddha Mukherjee, Hao Kang, Xiangrui Kong, Gang Hua, Tianyi Zhang, Bedrich Benes, Aniket Bera Do Vision and Language Encoders Represent the World Similarly?
Mayug Maniparambil, Raiymbek Akshulakov, Yasser Abdelaziz Dahou Djilali, Mohamed El Amine Seddik, Sanath Narayan, Karttikeya Mangalam, Noel E. O'Connor Domain Gap Embeddings for Generative Dataset Augmentation
Yinong Oliver Wang, Younjoon Chung, Chen Henry Wu, Fernando De la Torre Domain Prompt Learning with Quaternion Networks
Qinglong Cao, Zhengqin Xu, Yuntian Chen, Chao Ma, Xiaokang Yang Domain Separation Graph Neural Networks for Saliency Object Ranking
Zijian Wu, Jun Lu, Jing Han, Lianfa Bai, Yi Zhang, Zhuang Zhao, Siyang Song Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes
Hmrishav Bandyopadhyay, Subhadeep Koley, Ayan Das, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song Doubly Abductive Counterfactual Inference for Text-Based Image Editing
Xue Song, Jiequan Cui, Hanwang Zhang, Jingjing Chen, Richang Hong, Yu-Gang Jiang DPHMs: Diffusion Parametric Head Models for Depth-Based Tracking
Jiapeng Tang, Angela Dai, Yinyu Nie, Lev Markhasin, Justus Thies, Matthias Nießner DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
Yixuan Zhu, Ao Li, Yansong Tang, Wenliang Zhao, Jie Zhou, Jiwen Lu Dr. Bokeh: DiffeRentiable Occlusion-Aware Bokeh Rendering
Yichen Sheng, Zixun Yu, Lu Ling, Zhiwen Cao, Xuaner Zhang, Xin Lu, Ke Xian, Haiting Lin, Bedrich Benes Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem DragDiffusion: Harnessing Diffusion Models for Interactive Point-Based Image Editing
Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai DREAM: Diffusion Rectification and Estimation-Adaptive Models
Jinxin Zhou, Tianyu Ding, Tianyi Chen, Jiachen Jiang, Ilya Zharkov, Zhihui Zhu, Luming Liang DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Yunhan Yang, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang, Hengshuang Zhao, Tong He, Xihui Liu DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior
Tianyu Huang, Yihan Zeng, Zhilu Zhang, Wan Xu, Hang Xu, Songcen Xu, Rynson W.H. Lau, Wangmeng Zuo DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Yujie Wei, Shiwei Zhang, Zhiwu Qing, Hangjie Yuan, Zhiheng Liu, Yu Liu, Yingya Zhang, Jingren Zhou, Hongming Shan DriveTrack: A Benchmark for Long-Range Point Tracking in Real-World Videos
Arjun Balasingam, Joseph Chandler, Chenning Li, Zhoutong Zhang, Hari Balakrishnan DriveWorld: 4D Pre-Trained Scene Understanding via World Models for Autonomous Driving
Chen Min, Dawei Zhao, Liang Xiao, Jian Zhao, Xinli Xu, Zheng Zhu, Lei Jin, Jianshu Li, Yulan Guo, Junliang Xing, Liping Jing, Yiming Nie, Bin Dai Driving Everywhere with Large Language Model Policy Adaptation
Boyi Li, Yue Wang, Jiageng Mao, Boris Ivanovic, Sushant Veer, Karen Leung, Marco Pavone Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance
Junkai Fan, Jiangwei Weng, Kun Wang, Yijun Yang, Jianjun Qian, Jun Li, Jian Yang Dual DETRs for Multi-Label Temporal Action Detection
Yuhan Zhu, Guozhen Zhang, Jing Tan, Gangshan Wu, Limin Wang Dual Prior Unfolding for Snapshot Compressive Imaging
Jiancheng Zhang, Haijin Zeng, Jiezhang Cao, Yongyong Chen, Dengxiu Yu, Yin-Ping Zhao Dual Prototype Attention for Unsupervised Video Object Segmentation
Suhwan Cho, Minhyeok Lee, Seunghoon Lee, Dogyoon Lee, Heeseung Choi, Ig-Jae Kim, Sangyoun Lee Dual-View Visual Contextualization for Web Navigation
Jihyung Kil, Chan Hee Song, Boyuan Zheng, Xiang Deng, Yu Su, Wei-Lun Chao DualAD: Disentangling the Dynamic and Static World for End-to-End Driving
Simon Doll, Niklas Hanselmann, Lukas Schneider, Richard Schulz, Marius Cordts, Markus Enzweiler, Hendrik P. A. Lensch DUSt3R: Geometric 3D Vision Made Easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, Jerome Revaud Dynamic LiDAR Re-Simulation Using Compositional Neural Fields
Hanfeng Wu, Xingxing Zuo, Stefan Leutenegger, Or Litany, Konrad Schindler, Shengyu Huang Dynamic Prompt Optimizing for Text-to-Image Generation
Wenyi Mo, Tianyu Zhang, Yalong Bai, Bing Su, Ji-Rong Wen, Qing Yang Dynamic Support Information Mining for Category-Agnostic Pose Estimation
Pengfei Ren, Yuanyuan Gao, Haifeng Sun, Qi Qi, Jingyu Wang, Jianxin Liao DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu, Weijia Mao, Yuchao Gu, Rui Zhao, Jussi Keppo, Ying Shan, Mike Zheng Shou EASE-DETR: Easing the Competition Among Object Queries
Yulu Gao, Yifan Sun, Xudong Ding, Chuyang Zhao, Si Liu EasyDrag: Efficient Point-Based Manipulation on Diffusion Models
Xingzhong Hou, Boxiao Liu, Yi Zhang, Jihao Liu, Yu Liu, Haihang You Eclipse: Disambiguating Illumination and Materials Using Unintended Shadows
Dor Verbin, Ben Mildenhall, Peter Hedman, Jonathan T. Barron, Todd Zickler, Pratul P. Srinivasan Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior
Wonseok Roh, Hwanhee Jung, Giljoo Nam, Jinseop Yeom, Hyunje Park, Sang Ho Yoon, Sangpil Kim Edit One for All: Interactive Batch Image Editing
Thao Nguyen, Utkarsh Ojha, Yuheng Li, Haotian Liu, Yong Jae Lee Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents
Yuxi Wei, Zi Wang, Yifan Lu, Chenxin Xu, Changxing Liu, Hao Zhao, Siheng Chen, Yanfeng Wang Effective Video Mirror Detection with Inconsistent Motion Cues
Alex Warren, Ke Xu, Jiaying Lin, Gary K.L. Tam, Rynson W.H. Lau Efficient 3D Implicit Head Avatar with Mesh-Anchored Hash Table Blendshapes
Ziqian Bai, Feitong Tan, Sean Fanello, Rohit Pandey, Mingsong Dou, Shichen Liu, Ping Tan, Yinda Zhang Efficient Dataset Distillation via Minimax Diffusion
Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev, Haonan Wang, Wei Jiang, Yang You, Yiran Chen Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai Efficient Meshflow and Optical Flow Estimation from Event Cameras
Xinglong Luo, Ao Luo, Zhengning Wang, Chunyu Lin, Bing Zeng, Shuaicheng Liu Efficient Model Stealing Defense with Noise Transition Matrix
Dong-Dong Wu, Chilin Fu, Weichang Wu, Wenwen Xia, Xiaolu Zhang, Jun Zhou, Min-Ling Zhang Efficient Multi-Scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring
Xin Gao, Tianheng Qiu, Xinyu Zhang, Hanlin Bai, Kang Liu, Xuan Huang, Hu Wei, Guoying Zhang, Huaping Liu Efficient Multitask Dense Predictor via Binarization
Yuzhang Shang, Dan Xu, Gaowen Liu, Ramana Rao Kompella, Yan Yan Efficient Stitchable Task Adaptation
Haoyu He, Zizheng Pan, Jing Liu, Jianfei Cai, Bohan Zhuang Efficient Test-Time Adaptation of Vision-Language Models
Adilbek Karmanov, Dayan Guan, Shijian Lu, Abdulmotaleb El Saddik, Eric Xing EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-View Diffusion Priors
Zhipeng Hu, Minda Zhao, Chaoyi Zhao, Xinyue Liang, Lincheng Li, Zeng Zhao, Changjie Fan, Xiaowei Zhou, Xin Yu EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra EFHQ: Multi-Purpose ExtremePose-Face-HQ Dataset
Trung Tuan Dao, Duc Hong Vu, Cuong Pham, Anh Tran Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei Huang, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C.V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement
Jian Wang, Zhe Cao, Diogo Luvizon, Lingjie Liu, Kripasindhu Sarkar, Danhang Tang, Thabo Beeler, Christian Theobalt EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-Centric View of Procedural Activities in Real World
Yifei Huang, Guo Chen, Jilan Xu, Mingfang Zhang, Lijin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, Yu Qiao EgoGen: An Egocentric Synthetic Data Generator
Gen Li, Kaifeng Zhao, Siwei Zhang, Xiaozhong Lyu, Mihai Dusmanu, Yan Zhang, Marc Pollefeys, Siyu Tang EGTR: Extracting Graph from Transformer for Scene Graph Generation
Jinbae Im, JeongYeon Nam, Nokyung Park, Hyungmin Lee, Seunghyun Park EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
Haiyang Liu, Zihao Zhu, Giorgio Becherini, Yichen Peng, Mingyang Su, You Zhou, Xuefei Zhe, Naoya Iwamoto, Bo Zheng, Michael J. Black Embodied Multi-Modal Agent Trained by an LLM from a Parallel TextWorld
Yijun Yang, Tianyi Zhou, Kanxue Li, Dapeng Tao, Lusong Li, Li Shen, Xiaodong He, Jing Jiang, Yuhui Shi EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Tai Wang, Xiaohan Mao, Chenming Zhu, Runsen Xu, Ruiyuan Lyu, Peisen Li, Xiao Chen, Wenwei Zhang, Kai Chen, Tianfan Xue, Xihui Liu, Cewu Lu, Dahua Lin, Jiangmiao Pang EMOPortraits: Emotion-Enhanced Multimodal One-Shot Head Avatars
Nikita Drobyshev, Antoni Bigata Casademunt, Konstantinos Vougioukas, Zoe Landgraf, Stavros Petridis, Maja Pantic Emotional Speech-Driven 3D Body Animation via Disentangled Latent Diffusion
Kiran Chhatre, Radek Dan??ek, Nikos Athanasiou, Giorgio Becherini, Christopher Peters, Michael J. Black, Timo Bolkart EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
Hongxia Xie, Chu-Jun Peng, Yu-Wen Tseng, Hung-Jen Chen, Chan-Feng Hsu, Hong-Han Shuai, Wen-Huang Cheng Emu Edit: Precise Image Editing via Recognition and Generation Tasks
Shelly Sheynin, Adam Polyak, Uriel Singer, Yuval Kirstain, Amit Zohar, Oron Ashual, Devi Parikh, Yaniv Taigman End-to-End Spatio-Temporal Action Localisation with Video Transformers
Alexey A. Gritsenko, Xuehan Xiong, Josip Djolonga, Mostafa Dehghani, Chen Sun, Mario Lucic, Cordelia Schmid, Anurag Arnab Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model
Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi Tian Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning
Wei Zhang, Chaoqun Wan, Tongliang Liu, Xinmei Tian, Xu Shen, Jieping Ye Enhancing 3D Fidelity of Text-to-3D Using Cross-View Correspondences
Seungwook Kim, Kejie Li, Xueqing Deng, Yichun Shi, Minsu Cho, Peng Wang Enhancing Vision-Language Pre-Training with Rich Supervisions
Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Zhuowen Tu, Vijay Mahadevan, Stefano Soatto Enhancing Visual Continual Learning with Language-Guided Supervision
Bolin Ni, Hongbo Zhao, Chenghao Zhang, Ke Hu, Gaofeng Meng, Zhaoxiang Zhang, Shiming Xiang Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Xin Li, Yunfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun Ensemble Diversity Facilitates Adversarial Transferability
Bowen Tang, Zheng Wang, Yi Bin, Qi Dou, Yang Yang, Heng Tao Shen EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, Lu Sheng Equivariant Multi-Modality Image Fusion
Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Kai Zhang, Shuang Xu, Dongdong Chen, Radu Timofte, Luc Van Gool Equivariant Plug-and-Play Image Reconstruction
Matthieu Terris, Thomas Moreau, Nelly Pustelnik, Julian Tachella Error Detection in Egocentric Procedural Task Videos
Shih-Po Lee, Zijia Lu, Zekun Zhang, Minh Hoai, Ehsan Elhamifar EscherNet: A Generative Model for Scalable View Synthesis
Xin Kong, Shikun Liu, Xiaoyang Lyu, Marwan Taher, Xiaojuan Qi, Andrew J. Davison eTraM: Event-Based Traffic Monitoring Dataset
Aayush Atul Verma, Bharatesh Chakravarthi, Arpitsinh Vaghela, Hua Wei, Yezhou Yang EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
Yaofang Liu, Xiaodong Cun, Xuebo Liu, Xintao Wang, Yong Zhang, Haoxin Chen, Yang Liu, Tieyong Zeng, Raymond Chan, Ying Shan EvDiG: Event-Guided Direct and Global Components Separation
Xinyu Zhou, Peiqi Duan, Boyu Li, Chu Zhou, Chao Xu, Boxin Shi Event-Assisted Low-Light Video Object Segmentation
Hebei Li, Jin Wang, Jiahui Yuan, Yue Li, Wenming Weng, Yansong Peng, Yueyi Zhang, Zhiwei Xiong, Xiaoyan Sun Event-Based Structure-from-Orbit
Ethan Elms, Yasir Latif, Tae Ha Park, Tat-Jun Chin Event-Based Visible and Infrared Fusion via Multi-Task Collaboration
Mengyue Geng, Lin Zhu, Lizhi Wang, Wei Zhang, Ruiqin Xiong, Yonghong Tian EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams
Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Christian Theobalt, Vladislav Golyanik EventPS: Real-Time Photometric Stereo Using an Event Camera
Bohan Yu, Jieji Ren, Jin Han, Feishi Wang, Jinxiu Liang, Boxin Shi EVS-Assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation Through Sensor Inverse Modeling
Rui Jiang, Fangwen Tu, Yixuan Long, Aabhaas Vaish, Bowen Zhou, Qinyi Wang, Wei Zhang, Yuntan Fang, Luis Eduardo Garcia Capel, Bo Mu, Tiejun Dai, Andreas Suess Explaining CLIP's Performance Disparities on Data from Blind/Low Vision Users
Daniela Massiceti, Camilla Longden, Agnieszka Slowik, Samuel Wills, Martin Grayson, Cecily Morrison Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing Their Contributions
Namitha Padmanabhan, Matthew Gwilliam, Pulkit Kumar, Shishira R Maiya, Max Ehrlich, Abhinav Shrivastava Exploiting Diffusion Prior for Generalizable Dense Prediction
Hsin-Ying Lee, Hung-Yu Tseng, Hsin-Ying Lee, Ming-Hsuan Yang Exploiting Inter-Sample and Inter-Feature Relations in Dataset Distillation
Wenxiao Deng, Wenbin Li, Tianyu Ding, Lei Wang, Hongguang Zhang, Kuihua Huang, Jing Huo, Yang Gao Exploiting Style Latent Flows for Generalizing Deepfake Video Detection
Jongwook Choi, Taehoon Kim, Yonghyun Jeong, Seungryul Baek, Jongwon Choi ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models
Meng-Li Shih, Wei-Chiu Ma, Lorenzo Boyice, Aleksander Holynski, Forrester Cole, Brian Curless, Janne Kontkanen Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, Saining Xie F3Loc: Fusion and Filtering for Floorplan Localization
Changan Chen, Rui Wang, Christoph Vogel, Marc Pollefeys FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio
Chao Xu, Yang Liu, Jiazheng Xing, Weida Wang, Mingze Sun, Jun Dan, Tianxin Huang, Siyuan Li, Zhi-Qi Cheng, Ying Tai, Baigui Sun Faces That Speak: Jointly Synthesising Talking Face and Speech from Text
Youngjoon Jang, Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung FairCLIP: Harnessing Fairness in Vision-Language Learning
Yan Luo, Min Shi, Muhammad Osama Khan, Muhammad Muneeb Afzal, Hao Huang, Shuaihang Yuan, Yu Tian, Luo Song, Ava Kouhana, Tobias Elze, Yi Fang, Mengyu Wang FairRAG: Fair Human Generation via Fair Retrieval Augmentation
Robik Shrestha, Yang Zou, Qiuyu Chen, Zhiheng Li, Yusheng Xie, Siqi Deng Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
Bichen Wu, Ching-Yao Chuang, Xiaoyan Wang, Yichen Jia, Kapil Krishnakumar, Tong Xiao, Feng Liang, Licheng Yu, Peter Vajda FAR: Flexible Accurate and Robust 6DoF Relative Camera Pose Estimation
Chris Rockwell, Nilesh Kulkarni, Linyi Jin, Jeong Joon Park, Justin Johnson, David F. Fouhey Fast Adaptation for Human Pose Estimation via Meta-Optimization
Shengxiang Hu, Huaijiang Sun, Bin Li, Dong Wei, Weiqing Li, Jianfeng Lu Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi Federated Generalized Category Discovery
Nan Pu, Wenjing Li, Xingyuan Ji, Yalan Qin, Nicu Sebe, Zhun Zhong FedHCA2: Towards Hetero-Client Federated Multi-Task Learning
Yuxiang Lu, Suizhi Huang, Yuwen Yang, Shalayiding Sirejiding, Yue Ding, Hongtao Lu Feedback-Guided Autonomous Driving
Jimuyang Zhang, Zanming Huang, Arijit Ray, Eshed Ohn-Bar Few-Shot Learner Parameterization by Diffusion Time-Steps
Zhongqi Yue, Pan Zhou, Richang Hong, Hanwang Zhang, Qianru Sun Fine-Grained Bipartite Concept Factorization for Clustering
Chong Peng, Pengfei Zhang, Yongyong Chen, Zhao Kang, Chenglizhao Chen, Qiang Cheng FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures
Lisa Mais, Peter Hirsch, Claire Managan, Ramya Kandarpa, Josef Lorenz Rumberger, Annika Reinke, Lena Maier-Hein, Gudrun Ihrke, Dagmar Kainmueller Fitting Flats to Flats
Gabriel Dogadov, Ugo Finnendahl, Marc Alexa Fixed Point Diffusion Models
Xingjian Bai, Luke Melas-Kyriazi FLHetBench: Benchmarking Device and State Heterogeneity in Federated Learning
Junyuan Zhang, Shuang Zeng, Miao Zhang, Runxi Wang, Feifei Wang, Yuyin Zhou, Paul Pu Liang, Liangqiong Qu Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan Flow-Guided Online Stereo Rectification for Wide Baseline Stereo
Anush Kumar, Fahim Mannan, Omid Hosseini Jafari, Shile Li, Felix Heide FlowIE: Efficient Image Enhancement via Rectified Flow
Yixuan Zhu, Wenliang Zhao, Ao Li, Yansong Tang, Jie Zhou, Jiwen Lu FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
Feng Liang, Bichen Wu, Jialiang Wang, Licheng Yu, Kunpeng Li, Yinan Zhao, Ishan Misra, Jia-Bin Huang, Peizhao Zhang, Peter Vajda, Diana Marculescu FocSAM: Delving Deeply into Focused Objects in Segmenting Anything
You Huang, Zongyu Lan, Liujuan Cao, Xianming Lin, Shengchuan Zhang, Guannan Jiang, Rongrong Ji Forgery-Aware Adaptive Transformer for Generalizable Synthetic Image Detection
Huan Liu, Zichang Tan, Chuangchuang Tan, Yunchao Wei, Jingdong Wang, Yao Zhao Fourier Priors-Guided Diffusion for Zero-Shot Joint Low-Light Enhancement and Deblurring
Xiaoqian Lv, Shengping Zhang, Chenyang Wang, Yichen Zheng, Bineng Zhong, Chongyi Li, Liqiang Nie FREE: Faster and Better Data-Free Meta-Learning
Yongxian Wei, Zixuan Hu, Zhenyi Wang, Li Shen, Chun Yuan, Dacheng Tao FreeDrag: Feature Dragging for Reliable Point-Based Image Editing
Pengyang Ling, Lin Chen, Pan Zhang, Huaian Chen, Yi Jin, Jinjin Zheng FreeKD: Knowledge Distillation via Semantic Frequency Prompt
Yuan Zhang, Tao Huang, Jiaming Liu, Tao Jiang, Kuan Cheng, Shanghang Zhang FreeMan: Towards Benchmarking 3D Human Pose Estimation Under Real-World Conditions
Jiong Wang, Fengyu Yang, Bingliang Li, Wenbo Gou, Danqi Yan, Ailing Zeng, Yijun Gao, Junle Wang, Yanqing Jing, Ruimao Zhang FreePoint: Unsupervised Point Cloud Instance Segmentation
Zhikai Zhang, Jian Ding, Li Jiang, Dengxin Dai, Guisong Xia FreeU: Free Lunch in Diffusion U-Net
Chenyang Si, Ziqi Huang, Yuming Jiang, Ziwei Liu Friendly Sharpness-Aware Minimization
Tao Li, Pan Zhou, Zhengbao He, Xinwen Cheng, Xiaolin Huang From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Evonne Ng, Javier Romero, Timur Bagautdinov, Shaojie Bai, Trevor Darrell, Angjoo Kanazawa, Alexander Richard From Coarse to Fine-Grained Open-Set Recognition
Nico Lang, Vésteinn Snæbjarnarson, Elijah Cole, Oisin Mac Aodha, Christian Igel, Serge Belongie From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
Yong-Lu Li, Xiaoqian Wu, Xinpeng Liu, Zehao Wang, Yiming Dou, Yikun Ji, Junyi Zhang, Yixing Li, Xudong Lu, Jingru Tan, Cewu Lu FSC: Few-Point Shape Completion
Xianzu Wu, Xianfeng Wu, Tianyu Luan, Yajing Bai, Zhongyuan Lai, Junsong Yuan Fully Convolutional Slice-to-Volume Reconstruction for Single-Stack MRI
Sean I. Young, Yael Balbastre, Bruce Fischl, Polina Golland, Juan Eugenio Iglesias Fully Geometric Panoramic Localization
Junho Kim, Jiwon Jeong, Young Min Kim Functional Diffusion
Biao Zhang, Peter Wonka G-NeRF: Geometry-Enhanced Novel View Synthesis from Single-View Images
Zixiong Huang, Qi Chen, Libo Sun, Yifan Yang, Naizhou Wang, Qi Wu, Mingkui Tan GARField: Group Anything with Radiance Fields
Chung Min Kim, Mingxuan Wu, Justin Kerr, Ken Goldberg, Matthew Tancik, Angjoo Kanazawa Garment Recovery with Shape and Deformation Priors
Ren Li, Corentin Dumery, Benoît Guillard, Pascal Fua GART: Gaussian Articulated Template Models
Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, Kostas Daniilidis Gated Fields: Learning Scene Reconstruction from Gated Videos
Andrea Ramazzina, Stefanie Walz, Pragyan Dahal, Mario Bijelic, Felix Heide Gaussian Head Avatar: Ultra High-Fidelity Head Avatar via Dynamic Gaussians
Yuelang Xu, Benwang Chen, Zhe Li, Hongwen Zhang, Lizhen Wang, Zerong Zheng, Yebin Liu Gaussian Shell Maps for Efficient 3D Human Generation
Rameen Abdal, Wang Yifan, Zifan Shi, Yinghao Xu, Ryan Po, Zhengfei Kuang, Qifeng Chen, Dit-Yan Yeung, Gordon Wetzstein Gaussian Splatting SLAM
Hidenobu Matsuki, Riku Murai, Paul H.J. Kelly, Andrew J. Davison GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians
Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, Matthias Nießner GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
Taoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, Xinggang Wang GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, Guosheng Lin GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
Yingwenqi Jiang, Jiadong Tu, Yuan Liu, Xifeng Gao, Xiaoxiao Long, Wenping Wang, Yuexin Ma GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Ye Yuan, Xueting Li, Yangyi Huang, Shalini De Mello, Koki Nagano, Jan Kautz, Umar Iqbal GDA: Generalized Diffusion for Robust Test-Time Adaptation
Yun-Yun Tsai, Fu-Chen Chen, Albert Y. C. Chen, Junfeng Yang, Che-Chun Su, Min Sun, Cheng-Hao Kuo GEARS: Local Geometry-Aware Hand-Object Interaction Synthesis
Keyang Zhou, Bharat Lal Bhatnagar, Jan Eric Lenssen, Gerard Pons-Moll GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image
Chong Bao, Yinda Zhang, Yuan Li, Xiyu Zhang, Bangbang Yang, Hujun Bao, Marc Pollefeys, Guofeng Zhang, Zhaopeng Cui General Object Foundation Model for Images and Videos at Scale
Junfeng Wu, Yi Jiang, Qihao Liu, Zehuan Yuan, Xiang Bai, Song Bai General Point Model Pretraining with Autoencoding and Autoregressive
Zhe Li, Zhangyang Gao, Cheng Tan, Bocheng Ren, Laurence T. Yang, Stan Z. Li Generalizable Novel-View Synthesis Using a Stereo Camera
Haechan Lee, Wonjoon Jin, Seung-Hwan Baek, Sunghyun Cho Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction
Hao Li, Ying Chen, Yifei Chen, Rongshan Yu, Wenxian Yang, Liansheng Wang, Bowen Ding, Yuchen Han Generalized Event Cameras
Varun Sundar, Matthew Dutson, Andrei Ardelean, Claudio Bruschini, Edoardo Charbon, Mohit Gupta Generalized Predictive Model for Autonomous Driving
Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li Generating Enhanced Negatives for Training Language-Based Object Detectors
Shiyu Zhao, Long Zhao, B G Vijay Kumar, Yumin Suh, Dimitris N. Metaxas, Manmohan Chandraker, Samuel Schulter Generating Human Motion in 3D Scenes from Text Descriptions
Zhi Cen, Huaijin Pi, Sida Peng, Zehong Shen, Minghui Yang, Shuai Zhu, Hujun Bao, Xiaowei Zhou Generating Illustrated Instructions
Sachit Menon, Ishan Misra, Rohit Girdhar Generating Non-Stationary Textures Using Self-Rectification
Yang Zhou, Rongjun Xiao, Dani Lischinski, Daniel Cohen-Or, Hui Huang Generative Image Dynamics
Zhengqi Li, Richard Tucker, Noah Snavely, Aleksander Holynski Generative Multi-Modal Models Are Good Class Incremental Learners
Xusheng Cao, Haori Lu, Linlan Huang, Xialei Liu, Ming-Ming Cheng Generative Multimodal Models Are In-Context Learners
Quan Sun, Yufeng Cui, Xiaosong Zhang, Fan Zhang, Qiying Yu, Yueze Wang, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang Generative Powers of Ten
Xiaojuan Wang, Janne Kontkanen, Brian Curless, Steven M. Seitz, Ira Kemelmacher-Shlizerman, Ben Mildenhall, Pratul Srinivasan, Dor Verbin, Aleksander Holynski Generative Proxemics: A Prior for 3D Social Interaction from Images
Lea Müller, Vickie Ye, Georgios Pavlakos, Michael Black, Angjoo Kanazawa Generative Quanta Color Imaging
Vishal Purohit, Junjie Luo, Yiheng Chi, Qi Guo, Stanley H. Chan, Qiang Qiu Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
Shengqu Cai, Duygu Ceylan, Matheus Gadelha, Chun-Hao Paul Huang, Tuanfeng Yang Wang, Gordon Wetzstein Generative Unlearning for Any Identity
Juwon Seo, Sung-Hoon Lee, Tae-Young Lee, Seungjun Moon, Gyeong-Moon Park GenesisTex: Adapting Image Denoising Diffusion to Texture Space
Chenjian Gao, Boyan Jiang, Xinghui Li, Yingpeng Zhang, Qian Yu GenN2N: Generative NeRF2NeRF Translation
Xiangyue Liu, Han Xue, Kunming Luo, Ping Tan, Li Yi GenTron: Diffusion Transformers for Image and Video Generation
Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua GeoChat: Grounded Large Vision-Language Model for Remote Sensing
Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer, Abhijit Das, Salman Khan, Fahad Shahbaz Khan Geometry Transfer for Stylizing Radiance Fields
Hyunyoung Jung, Seonghyeon Nam, Nikolaos Sarafianos, Sungjoo Yoo, Alexander Sorkine-Hornung, Rakesh Ranjan GeoReF: Geometric Alignment Across Shape Variation for Category-Level Object Pose Refinement
Linfang Zheng, Tze Ho Elden Tse, Chen Wang, Yinghan Sun, Hua Chen, Ales Leonardis, Wei Zhang, Hyung Jin Chang GES : Generalized Exponential Splatting for Efficient Radiance Field Rendering
Abdullah Hamdi, Luke Melas-Kyriazi, Jinjie Mai, Guocheng Qian, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi GLACE: Global Local Accelerated Coordinate Encoding
Fangjinhua Wang, Xudong Jiang, Silvano Galliani, Christoph Vogel, Marc Pollefeys GLaMM: Pixel Grounding Large Multimodal Model
Hanoona Rasheed, Muhammad Maaz, Sahal Shaji, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Eric Xing, Ming-Hsuan Yang, Fahad S. Khan Global Latent Neural Rendering
Thomas Tanay, Matteo Maggioni GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation
Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani, Sriram Yenamandra, Theophile Gervet, Matthew Chang, Zsolt Kira, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models
Huimin Huang, Yawen Huang, Lanfen Lin, Ruofeng Tong, Yen-Wei Chen, Hao Zheng, Yuexiang Li, Yefeng Zheng GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo
Jiang Wu, Rui Li, Haofei Xu, Wenxun Zhao, Yu Zhu, Jinqiu Sun, Yanning Zhang GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
Hao Li, Dingwen Zhang, Yalun Dai, Nian Liu, Lechao Cheng, Jingfeng Li, Jingdong Wang, Junwei Han GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors
Yuan Dong, Qi Zuo, Xiaodong Gu, Weihao Yuan, Zhengyi Zhao, Zilong Dong, Liefeng Bo, Qixing Huang GPT-4V(ision) Is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu, Guandao Yang, Zhibing Li, Kai Zhang, Ziwei Liu, Leonidas Guibas, Dahua Lin, Gordon Wetzstein GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi, Ye Fang, Zeyi Sun, Xiaoyang Wu, Tong Wu, Jiaqi Wang, Dahua Lin, Hengshuang Zhao GraCo: Granularity-Controllable Interactive Segmentation
Yian Zhao, Kehan Li, Zesen Cheng, Pengchong Qiao, Xiawu Zheng, Rongrong Ji, Chang Liu, Li Yuan, Jie Chen Gradient-Based Parameter Selection for Efficient Fine-Tuning
Zhi Zhang, Qizhe Zhang, Zijun Gao, Renrui Zhang, Ekaterina Shutova, Shiji Zhou, Shanghang Zhang GRAM: Global Reasoning for Multi-Page VQA
Tsachi Blau, Sharon Fogel, Roi Ronen, Alona Golts, Roy Ganz, Elad Ben Avraham, Aviad Aberdam, Shahar Tsiper, Ron Litman GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs
Gege Gao, Weiyang Liu, Anpei Chen, Andreas Geiger, Bernhard Schölkopf GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang, Ziqiao Ma, Xiaofeng Gao, Suhaila Shakiah, Qiaozi Gao, Joyce Chai GroupContrast: Semantic-Aware Self-Supervised Representation Learning for 3D Understanding
Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bohao Peng, Hengshuang Zhao, Jiaya Jia GS-IR: 3D Gaussian Splatting for Inverse Rendering
Zhihao Liang, Qi Zhang, Ying Feng, Ying Shan, Kui Jia GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
Chi Yan, Delin Qu, Dan Xu, Bin Zhao, Zhigang Wang, Dong Wang, Xuelong Li GSVA: Generalized Segmentation via Multimodal Large Language Models
Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang Guided Slot Attention for Unsupervised Video Object Segmentation
Minhyeok Lee, Suhwan Cho, Dogyoon Lee, Chaewon Park, Jungho Lee, Sangyoun Lee H-ViT: A Hierarchical Vision Transformer for Deformable Image Registration
Morteza Ghahremani, Mohammad Khateri, Bailiang Jian, Benedikt Wiestler, Ehsan Adeli, Christian Wachinger Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation
Mukul Khanna, Yongsen Mao, Hanxiao Jiang, Sanjay Haresh, Brennan Shacklett, Dhruv Batra, Alexander Clegg, Eric Undersander, Angel X. Chang, Manolis Savva HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Qifan Yu, Juncheng Li, Longhui Wei, Liang Pang, Wentao Ye, Bosheng Qin, Siliang Tang, Qi Tian, Yueting Zhuang Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Chaoya Jiang, Haiyang Xu, Mengfan Dong, Jiaxing Chen, Wei Ye, Ming Yan, Qinghao Ye, Ji Zhang, Fei Huang, Shikun Zhang HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan, Fuxiao Liu, Xiyang Wu, Ruiqi Xian, Zongxia Li, Xiaoyu Liu, Xijun Wang, Lichang Chen, Furong Huang, Yaser Yacoob, Dinesh Manocha, Tianyi Zhou HanDiffuser: Text-to-Image Generation with Realistic Hand Appearances
Supreeth Narasimhaswamy, Uttaran Bhattacharya, Xiang Chen, Ishita Dasgupta, Saayan Mitra, Minh Hoai HardMo: A Large-Scale Hardcase Dataset for Motion Capture
Jiaqi Liao, Chuanchen Luo, Yinuo Du, Yuxi Wang, Xucheng Yin, Man Zhang, Zhaoxiang Zhang, Junran Peng Harnessing Large Language Models for Training-Free Video Anomaly Detection
Luca Zanella, Willi Menapace, Massimiliano Mancini, Yiming Wang, Elisa Ricci Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID
Wentan Tan, Changxing Ding, Jiayu Jiang, Fei Wang, Yibing Zhan, Dapeng Tao HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images
Xihe Yang, Xingyu Chen, Daiheng Gao, Shaohui Wang, Xiaoguang Han, Baoyuan Wang HEAL-SWIN: A Vision Transformer on the Sphere
Oscar Carlsson, Jan E. Gerken, Hampus Linander, Heiner Spieß, Fredrik Ohlsson, Christoffer Petersson, Daniel Persson Hearing Anything Anywhere
Mason Long Wang, Ryosuke Sawata, Samuel Clarke, Ruohan Gao, Shangzhe Wu, Jiajun Wu Hierarchical Spatio-Temporal Decoupling for Text-to-Video Generation
Zhiwu Qing, Shiwei Zhang, Jiayu Wang, Xiang Wang, Yujie Wei, Yingya Zhang, Changxin Gao, Nong Sang HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting
Yuheng Jiang, Zhehao Shen, Penghao Wang, Zhuo Su, Yu Hong, Yingliang Zhang, Jingyi Yu, Lan Xu Higher-Order Relational Reasoning for Pedestrian Trajectory Prediction
Sungjune Kim, Hyung-gun Chi, Hyerin Lim, Karthik Ramani, Jinkyu Kim, Sangpil Kim HIMap: HybrId Representation Learning for End-to-End Vectorized HD mAP Construction
Yi Zhou, Hui Zhang, Jiaqian Yu, Yifan Yang, Sangil Jung, Seung-In Park, ByungIn Yoo HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation
Yongliang Lin, Yongzhi Su, Praveen Nathan, Sandeep Inuganti, Yan Di, Martin Sundermeyer, Fabian Manhardt, Didier Stricker, Jason Rambach, Yu Zhang HIT: Estimating Internal Human Implicit Tissues from the Body Surface
Marilyn Keller, Vaibhav Arora, Abdelmouttaleb Dakri, Shivam Chandhok, Jürgen Machann, Andreas Fritsche, Michael J. Black, Sergi Pujades HIVE: Harnessing Human Feedback for Instructional Visual Editing
Shu Zhang, Xinyi Yang, Yihao Feng, Can Qin, Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong, Ran Xu HMD-Poser: On-Device Real-Time Human Motion Tracking from Scalable Sparse Observations
Peng Dai, Yang Zhang, Tao Liu, Zhen Fan, Tianyuan Du, Zhuo Su, Xiaozheng Zheng, Zeming Li HOI-M^3: Capture Multiple Humans and Objects Interaction Within Contextual Environment
Juze Zhang, Jingyan Zhang, Zining Song, Zhanhe Shi, Chengfeng Zhao, Ye Shi, Jingyi Yu, Lan Xu, Jingya Wang HOIAnimator: Generating Text-Prompt Human-Object Animations Using Novel Perceptive Diffusion Models
Wenfeng Song, Xinyu Zhang, Shuai Li, Yang Gao, Aimin Hao, Xia Hou, Chenglizhao Chen, Ning Li, Hong Qin HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
Mengqi Zhang, Yang Fu, Zheng Ding, Sifei Liu, Zhuowen Tu, Xiaolong Wang HOLD: Category-Agnostic 3D Reconstruction of Interacting Hands and Objects from Video
Zicong Fan, Maria Parelli, Maria Eleni Kadoglou, Xu Chen, Muhammed Kocabas, Michael J. Black, Otmar Hilliges Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image
Yiqun Mei, Yu Zeng, He Zhang, Zhixin Shu, Xuaner Zhang, Sai Bi, Jianming Zhang, HyunJoon Jung, Vishal M. Patel Holodeck: Language Guided Generation of 3D Embodied AI Environments
Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, Christopher Clark Holoported Characters: Real-Time Free-Viewpoint Rendering of Humans from Sparse RGB Cameras
Ashwath Shetty, Marc Habermann, Guoxing Sun, Diogo Luvizon, Vladislav Golyanik, Christian Theobalt HomoFormer: Homogenized Transformer for Image Shadow Removal
Jie Xiao, Xueyang Fu, Yurui Zhu, Dong Li, Jie Huang, Kai Zhu, Zheng-Jun Zha HouseCat6D - A Large-Scale Multi-Modal Category Level 6d Object Perception Dataset with Household Objects in Realistic Scenarios
HyunJun Jung, Shun-Cheng Wu, Patrick Ruhkamp, Guangyao Zhai, Hannah Schieber, Giulia Rizzoli, Pengyuan Wang, Hongcheng Zhao, Lorenzo Garattoni, Sven Meier, Daniel Roth, Nassir Navab, Benjamin Busam How Far Can We Compress Instant-NGP-Based NeRF?
Yihang Chen, Qianyi Wu, Mehrtash Harandi, Jianfei Cai How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?
Yuxin Chen, Zongyang Ma, Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Bing Li, Junfu Pu, Ying Shan, Xiaojuan Qi, Weiming Hu How to Train Neural Field Representations: A Comprehensive Study and Benchmark
Samuele Papa, Riccardo Valperga, David Knigge, Miltiadis Kofinas, Phillip Lippe, Jan-Jakob Sonke, Efstratios Gavves HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-Based Semantic Segmentation
Linglin Jing, Yiming Ding, Yunpeng Gao, Zhigang Wang, Xu Yan, Dong Wang, Gerald Schaefer, Hui Fang, Bin Zhao, Xuelong Li HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention
Xiaolong Tang, Meina Kan, Shiguang Shan, Zhilong Ji, Jinfeng Bai, Xilin Chen HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu, Kun Yin, Haoyu Cao, Xinghua Jiang, Xin Li, Yinsong Liu, Deqiang Jiang, Xing Sun, Linli Xu HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting
Hongyu Zhou, Jiahao Shao, Lu Xu, Dongfeng Bai, Weichao Qiu, Bingbing Liu, Yue Wang, Andreas Geiger, Yiyi Liao HUGS: Human Gaussian Splats
Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan Human Gaussian Splatting: Real-Time Rendering of Animatable Avatars
Arthur Moreau, Jifei Song, Helisa Dhamo, Richard Shaw, Yiren Zhou, Eduardo Pérez-Pellitero Human Motion Prediction Under Unexpected Perturbation
Jiangbei Yue, Baiyi Li, Julien Pettré, Armin Seyfried, He Wang HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, Ziwei Liu HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces
Haithem Turki, Vasu Agrawal, Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder, Deva Ramanan, Michael Zollhöfer, Christian Richardt Hyperbolic Anomaly Detection
Huimin Li, Zhentao Chen, Yunhao Xu, Junlin Hu HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Wei Wei, Tingbo Hou, Yael Pritch, Neal Wadhwa, Michael Rubinstein, Kfir Aberman Hyperspherical Classification with Dynamic Label-to-Prototype Assignment
Mohammad Saeed Ebrahimi Saadabadi, Ali Dabouei, Sahar Rahimi Malakshan, Nasser M. Nasrabadi I'M HOI: Inertia-Aware Monocular Capture of 3D Human-Object Interactions
Chengfeng Zhao, Juze Zhang, Jiashen Du, Ziwei Shan, Junye Wang, Jingyi Yu, Jingya Wang, Lan Xu ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization
Weiyao Wang, Pierre Gleize, Hao Tang, Xingyu Chen, Kevin J Liang, Matt Feiszli ID-Blau: Image Deblurring by Implicit Diffusion-Based reBLurring AUgmentation
Jia-Hao Wu, Fu-Jen Tsai, Yan-Tsung Peng, Chung-Chi Tsai, Chia-Wen Lin, Yen-Yu Lin ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection
Yichen Bai, Zongbo Han, Bing Cao, Xiaoheng Jiang, Qinghua Hu, Changqing Zhang iKUN: Speak to Trackers Without Retraining
Yunhao Du, Cheng Lei, Zhicheng Zhao, Fei Su Image Neural Field Diffusion Models
Yinbo Chen, Oliver Wang, Richard Zhang, Eli Shechtman, Xiaolong Wang, Michael Gharbi Image Sculpting: Precise Object Editing with 3D Geometry Control
Jiraphon Yenphraphai, Xichen Pan, Sainan Liu, Daniele Panozzo, Saining Xie Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang, Chun-Pei Chen, Yu-Lun Liu, Min-Hung Chen, Hou-Ning Hu, Yung-Yu Chuang, Yen-Yu Lin Implicit Event-RGBD Neural SLAM
Delin Qu, Chi Yan, Dong Wang, Jie Yin, Qizhi Chen, Dan Xu, Yiting Zhang, Bin Zhao, Xuelong Li Implicit Motion Function
Yue Gao, Jiahao Li, Lei Chu, Yan Lu IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, He Zhang, Wei Xiong, Daniel Aliaga Improved Baselines with Visual Instruction Tuning
Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee Improved Visual Grounding Through Self-Consistent Explanations
Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez Improving Bird's Eye View Semantic Segmentation by Task Decomposition
Tianhao Zhao, Yongcan Chen, Yu Wu, Tianyang Liu, Bo Du, Peilun Xiao, Shi Qiu, Hongda Yang, Guozhen Li, Yi Yang, Yutian Lin Improving Depth Completion via Depth Feature Upsampling
Yufei Wang, Ge Zhang, Shaoqian Wang, Bo Li, Qi Liu, Le Hui, Yuchao Dai Improving Distant 3D Object Detection Using 2D Box Supervision
Zetong Yang, Zhiding Yu, Chris Choy, Renhao Wang, Anima Anandkumar, Jose M. Alvarez Improving Graph Contrastive Learning via Adaptive Positive Sampling
Jiaming Zhuo, Feiyang Qin, Can Cui, Kun Fu, Bingxin Niu, Mengzhu Wang, Yuanfang Guo, Chuan Wang, Zhen Wang, Xiaochun Cao, Liang Yang Improving Image Restoration Through Removing Degradations in Textual Representations
Jingbo Lin, Zhilu Zhang, Yuxiang Wei, Dongwei Ren, Dongsheng Jiang, Qi Tian, Wangmeng Zuo Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment
Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir, M. Saquib Sarfraz, Mohsen Ali Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
Kelvin C.K. Chan, Yang Zhao, Xuhui Jia, Ming-Hsuan Yang, Huisheng Wang Improving Unsupervised Hierarchical Representation with Reinforcement Learning
Ruyi An, Yewen Li, Xu He, Pengjie Gu, Mengchen Zhao, Dong Li, Jianye Hao, Chaojie Wang, Bo An, Mingyuan Zhou In-Context Matting
He Guo, Zixuan Ye, Zhiguo Cao, Hao Lu InceptionNeXt: When Inception Meets ConvNeXt
Weihao Yu, Pan Zhou, Shuicheng Yan, Xinchao Wang Incremental Residual Concept Bottleneck Models
Chenming Shang, Shiji Zhou, Hengyuan Zhang, Xinzhe Ni, Yujiu Yang, Yuwang Wang Infinigen Indoors: Photorealistic Indoor Scenes Using Procedural Generation
Alexander Raistrick, Lingjie Mei, Karhan Kayan, David Yan, Yiming Zuo, Beining Han, Hongyu Wen, Meenal Parakh, Stamatis Alexandropoulos, Lahav Lipson, Zeyu Ma, Jia Deng Infrared Adversarial Car Stickers
Xiaopei Zhu, Yuqiu Liu, Zhanhao Hu, Jianmin Li, Xiaolin Hu Initialization Matters for Adversarial Transfer Learning
Andong Hua, Jindong Gu, Zhiyu Xue, Nicholas Carlini, Eric Wong, Yao Qin Inlier Confidence Calibration for Point Cloud Registration
Yongzhe Yuan, Yue Wu, Xiaolong Fan, Maoguo Gong, Qiguang Miao, Wenping Ma Insights from the Use of Previously Unseen Neural Architecture Search Datasets
Rob Geada, David Towers, Matthew Forshaw, Amir Atapour-Abarghouei, A. Stephen McGough InstanceDiffusion: Instance-Level Control for Image Generation
Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra Instantaneous Perception of Moving Objects in 3D
Di Liu, Bingbing Zhuang, Dimitris N. Metaxas, Manmohan Chandraker Instruct-Imagen: Image Generation with Multi-Modal Instruction
Hexiang Hu, Kelvin C.K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, Boqing Gong, William Cohen, Ming-Wei Chang, Xuhui Jia Instruct-ReID: A Multi-Purpose Person Re-Identification Task with Instructions
Weizhen He, Yiheng Deng, Shixiang Tang, Qihao Chen, Qingsong Xie, Yizhou Wang, Lei Bai, Feng Zhu, Rui Zhao, Wanli Ouyang, Donglian Qi, Yunfeng Yan InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng, Binxin Yang, Tiankai Hang, Chen Li, Shuyang Gu, Ting Zhang, Jianmin Bao, Zheng Zhang, Houqiang Li, Han Hu, Dong Chen, Baining Guo InstructVideo: Instructing Video Diffusion Models with Human Feedback
Hangjie Yuan, Shiwei Zhang, Xiang Wang, Yujie Wei, Tao Feng, Yining Pan, Yingya Zhang, Ziwei Liu, Samuel Albanie, Dong Ni Intensity-Robust Autofocus for Spike Camera
Changqing Su, Zhiyuan Ye, Yongsheng Xiao, You Zhou, Zhen Cheng, Bo Xiong, Zhaofei Yu, Tiejun Huang Inter-X: Towards Versatile Human-Human Interaction Analysis
Liang Xu, Xintao Lv, Yichao Yan, Xin Jin, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, Yunhui Liu, Wenjun Zeng, Xiaokang Yang Interactive Continual Learning: Fast and Slow Thinking
Biqing Qi, Xinquan Chen, Junqi Gao, Dong Li, Jianxing Liu, Ligang Wu, Bowen Zhou Interactive3D: Create What You Want by Interactive 3D Generation
Shaocong Dong, Lihe Ding, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding
Alessandro Achille, Greg Ver Steeg, Tian Yu Liu, Matthew Trager, Carson Klingenberg, Stefano Soatto IReNe: Instant Recoloring of Neural Radiance Fields
Alessio Mazzucchelli, Adrian Garcia-Garcia, Elena Garces, Fernando Rivas-Manzaneque, Francesc Moreno-Noguer, Adrian Penate-Sanchez Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
Junbo Yin, Jianbing Shen, Runnan Chen, Wei Li, Ruigang Yang, Pascal Frossard, Wenguan Wang It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
Subhadeep Koley, Ayan Kumar Bhunia, Deeptanshu Sekhri, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song iToF-Flow-Based High Frame Rate Depth Imaging
Yu Meng, Zhou Xue, Xu Chang, Xuemei Hu, Tao Yue Jack of All Tasks Master of Many: Designing General-Purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick, Guangxing Han, Rui Hou, Sayan Nag, Ser-Nam Lim, Nicolas Ballas, Qifan Wang, Rama Chellappa, Amjad Almahairi JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
Yu Zeng, Vishal M. Patel, Haochen Wang, Xun Huang, Ting-Chun Wang, Ming-Yu Liu, Yogesh Balaji JointSQ: Joint Sparsification-Quantization for Distributed Learning
Weiying Xie, Haowei Li, Jitao Ma, Yunsong Li, Jie Lei, Donglai Liu, Leyuan Fang KeyPoint Relative Position Encoding for Face Recognition
Minchul Kim, Yiyang Su, Feng Liu, Anil Jain, Xiaoming Liu Koala: Key Frame-Conditioned Long Video-LLM
Reuben Tan, Ximeng Sun, Ping Hu, Jui-hsien Wang, Hanieh Deilamsalehy, Bryan A. Plummer, Bryan Russell, Kate Saenko KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation
Ruida Zhang, Chenyangguang Zhang, Yan Di, Fabian Manhardt, Xingyu Liu, Federico Tombari, Xiangyang Ji KPConvX: Modernizing Kernel Point Convolution with Kernel Attention
Hugues Thomas, Yao-Hung Hubert Tsai, Timothy D. Barfoot, Jian Zhang KVQ: Kwai Video Quality Assessment for Short-Form Videos
Yiting Lu, Xin Li, Yajing Pei, Kun Yuan, Qizhi Xie, Yunpeng Qu, Ming Sun, Chao Zhou, Zhibo Chen L-MAGIC: Language Model Assisted Generation of Images with Coherence
Zhipeng Cai, Matthias Mueller, Reiner Birkl, Diana Wofk, Shao-Yen Tseng, Junda Cheng, Gabriela Ben-Melech Stan, Vasudev Lai, Michael Paulitsch L2B: Learning to Bootstrap Robust Models for Combating Label Noise
Yuyin Zhou, Xianhang Li, Fengze Liu, Qingyue Wei, Xuxi Chen, Lequan Yu, Cihang Xie, Matthew P. Lungren, Lei Xing LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection
Dat Nguyen, Nesryne Mejri, Inder Pal Singh, Polina Kuleshova, Marcella Astrid, Anis Kacem, Enjie Ghorbel, Djamila Aouada LAENeRF: Local Appearance Editing for Neural Radiance Fields
Lukas Radl, Michael Steiner, Andreas Kurz, Markus Steinberger LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion
Pancheng Zhao, Peng Xu, Pengda Qin, Deng-Ping Fan, Zhicheng Zhang, Guoli Jia, Bowen Zhou, Jufeng Yang LAMP: Learn a Motion Pattern for Few-Shot Video Generation
Ruiqi Wu, Liangyu Chen, Tong Yang, Chunle Guo, Chongyi Li, Xiangyu Zhang LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
Yunsheng Ma, Can Cui, Xu Cao, Wenqian Ye, Peiran Liu, Juanwu Lu, Amr Abdelraouf, Rohit Gupta, Kyungtae Han, Aniket Bera, James M. Rehg, Ziran Wang LangSplat: 3D Language Gaussian Splatting
Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, Hanspeter Pfister Language Model Guided Interpretable Video Action Reasoning
Ning Wang, Guangming Zhu, Hs Li, Liang Zhang, Syed Afaq Ali Shah, Mohammed Bennamoun Language-Driven Anchors for Zero-Shot Adversarial Robustness
Xiao Li, Wei Zhang, Yining Liu, Zhanhao Hu, Bo Zhang, Xiaolin Hu Language-Driven Grasp Detection
An Dinh Vuong, Minh Nhat Vu, Baoru Huang, Nghia Nguyen, Hieu Le, Thieu Vo, Anh Nguyen Language-Guided Image Reflection Separation
Haofeng Zhong, Yuchen Hong, Shuchen Weng, Jinxiu Liang, Boxin Shi Language-Only Training of Zero-Shot Composed Image Retrieval
Geonmo Gu, Sanghyuk Chun, Wonjae Kim, Yoohoon Kang, Sangdoo Yun LASO: Language-Guided Affordance Segmentation on 3D Object
Yicong Li, Na Zhao, Junbin Xiao, Chun Feng, Xiang Wang, Tat-seng Chua Latency Correction for Event-Guided Deblurring and Frame Interpolation
Yixin Yang, Jinxiu Liang, Bohan Yu, Yan Chen, Jimmy S. Ren, Boxin Shi Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
Qilong Zhangli, Jindong Jiang, Di Liu, Licheng Yu, Xiaoliang Dai, Ankit Ramchandani, Guan Pang, Dimitris N. Metaxas, Praveen Krishnan LEAD: Exploring Logit Space Evolution for Model Selection
Zixuan Hu, Xiaotong Li, Shixiang Tang, Jun Liu, Yichun Hu, Ling-Yu Duan LEAD: Learning Decomposition for Source-Free Universal Domain Adaptation
Sanqing Qu, Tianpei Zou, Lianghua He, Florian Röhrbein, Alois Knoll, Guang Chen, Changjun Jiang Learn from View Correlation: An Anchor Enhancement Strategy for Multi-View Clustering
Suyuan Liu, Ke Liang, Zhibin Dong, Siwei Wang, Xihong Yang, Sihang Zhou, En Zhu, Xinwang Liu Learned Representation-Guided Diffusion Models for Large-Image Generation
Alexandros Graikos, Srikar Yellapragada, Minh-Quan Le, Saarthak Kapse, Prateek Prasanna, Joel Saltz, Dimitris Samaras Learning Continuous 3D Words for Text-to-Image Generation
Ta-Ying Cheng, Matheus Gadelha, Thibault Groueix, Matthew Fisher, Radomir Mech, Andrew Markham, Niki Trigoni Learning Correlation Structures for Vision Transformers
Manjin Kim, Paul Hongsuck Seo, Cordelia Schmid, Minsu Cho Learning Diffusion Texture Priors for Image Restoration
Tian Ye, Sixiang Chen, Wenhao Chai, Zhaohu Xing, Jing Qin, Ge Lin, Lei Zhu Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis
Zicheng Zhang, Ruobing Zheng, Bonan Li, Congying Han, Tianqi Li, Meng Wang, Tiande Guo, Jingdong Chen, Ziwen Liu, Ming Yang Learning Equi-Angular Representations for Online Continual Learning
Minhyuk Seo, Hyunseo Koh, Wonje Jeung, Minjae Lee, San Kim, Hankook Lee, Sungjun Cho, Sungik Choi, Hyunwoo Kim, Jonghyun Choi Learning for Transductive Threshold Calibration in Open-World Recognition
Qin Zhang, Dongsheng An, Tianjun Xiao, Tong He, Qingming Tang, Ying Nian Wu, Joseph Tighe, Yifan Xing Learning from One Continuous Video Stream
João Carreira, Michael King, Viorica Patraucean, Dilara Gokay, Catalin Ionescu, Yi Yang, Daniel Zoran, Joseph Heyward, Carl Doersch, Yusuf Aytar, Dima Damen, Andrew Zisserman Learning from Synthetic Human Group Activities
Che-Jui Chang, Danrui Li, Deep Patel, Parth Goel, Honglu Zhou, Seonghyeon Moon, Samuel S. Sohn, Sejong Yoon, Vladimir Pavlovic, Mubbasir Kapadia Learning Inclusion Matching for Animation Paint Bucket Colorization
Yuekun Dai, Shangchen Zhou, Qinyue Li, Chongyi Li, Chen Change Loy Learning Large-Factor EM Image Super-Resolution with Generative Priors
Jiateng Shou, Zeyu Xiao, Shiyu Deng, Wei Huang, Peiyao Shi, Ruobing Zhang, Zhiwei Xiong, Feng Wu Learning Multi-Dimensional Human Preference for Text-to-Image Generation
Sixian Zhang, Bohan Wang, Junqiang Wu, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang Learning Occupancy for Monocular 3D Object Detection
Liang Peng, Junkai Xu, Haoran Cheng, Zheng Yang, Xiaopei Wu, Wei Qian, Wenxiao Wang, Boxi Wu, Deng Cai Learning Structure-from-Motion with Graph Attention Networks
Lucas Brynte, José Pedro Iglesias, Carl Olsson, Fredrik Kahl Learning the 3D Fauna of the Web
Zizhang Li, Dor Litvak, Ruining Li, Yunzhi Zhang, Tomas Jakab, Christian Rupprecht, Shangzhe Wu, Andrea Vedaldi, Jiajun Wu Learning to Count Without Annotations
Lukas Knobel, Tengda Han, Yuki M. Asano Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed, Michael S. Ryoo, Tsung-Yu Lin Learning to Navigate Efficiently and Precisely in Real Environments
Guillaume Bono, Hervé Poirier, Leonid Antsfeld, Gianluca Monaci, Boris Chidlovskii, Christian Wolf Learning to Rank Patches for Unbiased Image Redundancy Reduction
Yang Luo, Zhineng Chen, Peng Zhou, Zuxuan Wu, Xieping Gao, Yu-Gang Jiang Learning to Segment Referred Objects from Narrated Egocentric Videos
Yuhan Shen, Huiyu Wang, Xitong Yang, Matt Feiszli, Ehsan Elhamifar, Lorenzo Torresani, Effrosyni Mavroudi Learning Triangular Distribution in Visual World
Ping Chen, Xingpeng Zhang, Chengtao Zhou, Dichao Fan, Peng Tu, Le Zhang, Yanlin Qian Learning Vision from Models Rivals Learning Vision from Data
Yonglong Tian, Lijie Fan, Kaifeng Chen, Dina Katabi, Dilip Krishnan, Phillip Isola Learning Visual Prompt for Gait Recognition
Kang Ma, Ying Fu, Chunshui Cao, Saihui Hou, Yongzhen Huang, Dezhi Zheng LEDITS++: Limitless Image Editing Using Text-to-Image Models
Manuel Brack, Felix Friedrich, Katharia Kornmeier, Linoy Tsaban, Patrick Schramowski, Kristian Kersting, Apolinario Passos LEOD: Label-Efficient Object Detection for Event Cameras
Ziyi Wu, Mathias Gehrig, Qing Lyu, Xudong Liu, Igor Gilitschenski Leveraging Frame Affinity for sRGB-to-RAW Video De-Rendering
Chen Zhang, Wencheng Han, Yang Zhou, Jianbing Shen, Cheng-zhong Xu, Wentao Liu LiDAR-Based Person Re-Identification
Wenxuan Guo, Zhiyu Pan, Yingping Liang, Ziheng Xi, Zhicheng Zhong, Jianjiang Feng, Jie Zhou LiDAR-Net: A Real-Scanned 3D Point Cloud Dataset for Indoor Scenes
Yanwen Guo, Yuanqi Li, Dayong Ren, Xiaohong Zhang, Jiawei Li, Liang Pu, Changfeng Ma, Xiaoyu Zhan, Jie Guo, Mingqiang Wei, Yan Zhang, Piaopiao Yu, Shuangyu Yang, Donghao Ji, Huisheng Ye, Hao Sun, Yansong Liu, Yinuo Chen, Jiaqi Zhu, Hongyu Liu LidaRF: Delving into LiDAR for Neural Radiance Field on Street Scenes
Shanlin Sun, Bingbing Zhuang, Ziyu Jiang, Buyu Liu, Xiaohui Xie, Manmohan Chandraker Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D
Mukund Varma T, Peihao Wang, Zhiwen Fan, Zhangyang Wang, Hao Su, Ravi Ramamoorthi LightIt: Illumination Modeling and Control for Diffusion Models
Peter Kocsis, Julien Philip, Kalyan Sunkavalli, Matthias Nießner, Yannick Hold-Geoffroy Link-Context Learning for Multimodal LLMs
Yan Tai, Weichen Fan, Zhao Zhang, Ziwei Liu LiSA: LiDAR Localization with Semantic Awareness
Bochun Yang, Zijun Li, Wen Li, Zhipeng Cai, Chenglu Wen, Yu Zang, Matthias Muller, Cheng Wang LISA: Reasoning Segmentation via Large Language Model
Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning
Sijin Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao Chen LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation
Kibum Kim, Kanghoon Yoon, Jaehyeong Jeon, Yeonjun In, Jinyoung Moon, Donghyun Kim, Chanyoung Park LLMs Are Good Sign Language Translators
Jia Gong, Lin Geng Foo, Yixuan He, Hossein Rahmani, Jun Liu LMDrive: Closed-Loop End-to-End Driving with Large Language Models
Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L. Waslander, Yu Liu, Hongsheng Li Locally Adaptive Neural 3D Morphable Models
Michail Tarasiou, Rolandos Alexandros Potamias, Eimear O'Sullivan, Stylianos Ploumpis, Stefanos Zafeiriou Logarithmic Lenses: Exploring Log RGB Data for Image Classification
Bruce A. Maxwell, Sumegha Singhania, Avnish Patel, Rahul Kumar, Heather Fryling, Sihan Li, Haonan Sun, Ping He, Zewen Li Logit Standardization in Knowledge Distillation
Shangquan Sun, Wenqi Ren, Jingzhi Li, Rui Wang, Xiaochun Cao Loopy-SLAM: Dense Neural SLAM with Loop Closures
Lorenzo Liso, Erik Sandström, Vladimir Yugay, Luc Van Gool, Martin R. Oswald Loose Inertial Poser: Motion Capture with IMU-Attached Loose-Wear Jacket
Chengxu Zuo, Yiming Wang, Lishuang Zhan, Shihui Guo, Xinyu Yi, Feng Xu, Yipeng Qin LORS: Low-Rank Residual Structure for Parameter-Efficient Network Stacking
Jialin Li, Qiang Nie, Weifu Fu, Yuhuan Lin, Guangpin Tao, Yong Liu, Chengjie Wang LoS: Local Structure-Guided Stereo Matching
Kunhong Li, Longguang Wang, Ye Zhang, Kaiwen Xue, Shunbo Zhou, Yulan Guo LOTUS: Evasive and Resilient Backdoor Attacks Through Sub-Partitioning
Siyuan Cheng, Guanhong Tao, Yingqi Liu, Guangyu Shen, Shengwei An, Shiwei Feng, Xiangzhe Xu, Kaiyuan Zhang, Shiqing Ma, Xiangyu Zhang Low-Latency Neural Stereo Streaming
Qiqi Hou, Farzad Farhadzadeh, Amir Said, Guillaume Sautiere, Hoang Le Low-Power Continuous Remote Behavioral Localization with Event Cameras
Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez, Tom Hart, Alex Kacelnik, Guillermo Gallego Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs
Lin Song, Yukang Chen, Shuai Yang, Xiaohan Ding, Yixiao Ge, Ying-Cong Chen, Ying Shan Low-Rank Knowledge Decomposition for Medical Foundation Models
Yuhang Zhou, Haolin Li, Siyuan Du, Jiangchao Yao, Ya Zhang, Yanfeng Wang Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach
Wei Dong, Xing Zhang, Bihui Chen, Dawei Yan, Zhijun Lin, Qingsen Yan, Peng Wang, Yang Yang Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning
Haoyu Chen, Wenbo Li, Jinjin Gu, Jingjing Ren, Haoze Sun, Xueyi Zou, Zhensong Zhang, Youliang Yan, Lei Zhu LowRankOcc: Tensor Decomposition and Low-Rank Recovery for Vision-Based 3D Semantic Occupancy Prediction
Linqing Zhao, Xiuwei Xu, Ziwei Wang, Yunpeng Zhang, Borui Zhang, Wenzhao Zheng, Dalong Du, Jie Zhou, Jiwen Lu LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP
Yunshi Huang, Fereshteh Shakeri, Jose Dolz, Malik Boudiaf, Houda Bahig, Ismail Ben Ayed LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging
Haoyang Ge, Qiao Feng, Hailong Jia, Xiongzheng Li, Xiangjun Yin, You Zhou, Jingyu Yang, Kun Li LTA-PCS: Learnable Task-Agnostic Point Cloud Sampling
Jiaheng Liu, Jianhao Li, Kaisiyuan Wang, Hongcheng Guo, Jian Yang, Junran Peng, Ke Xu, Xianglong Liu, Jinyang Guo LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-Time Rendering
Jaehoon Choi, Rajvi Shah, Qinbo Li, Yipeng Wang, Ayush Saraf, Changil Kim, Jia-Bin Huang, Dinesh Manocha, Suhib Alsisan, Johannes Kopf LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images
Jing Zhang, Irving Fang, Hao Wu, Akshat Kaushik, Alice Rodriguez, Hanwen Zhao, Juexiao Zhang, Zhuo Zheng, Radu Iovita, Chen Feng M&M VTO: Multi-Garment Virtual Try-on and Editing
Luyang Zhu, Yingwei Li, Nan Liu, Hao Peng, Dawei Yang, Ira Kemelmacher-Shlizerman M3-UDA: A New Benchmark for Unsupervised Domain Adaptive Fetal Cardiac Structure Detection
Bin Pu, Liwen Wang, Jiewen Yang, Guannan He, Xingbo Dong, Shengli Li, Ying Tan, Ming Chen, Zhe Jin, Kenli Li, Xiaomeng Li MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He, Hengduo Li, Young Kyun Jang, Menglin Jia, Xuefei Cao, Ashish Shah, Abhinav Shrivastava, Ser-Nam Lim MACE: Mass Concept Erasure in Diffusion Models
Shilin Lu, Zilan Wang, Leyang Li, Yanzhu Liu, Adams Wai-Kin Kong MaGGIe: Masked Guided Gradual Human Instance Matting
Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava, Joon-Young Lee MagicAnimate: Temporally Consistent Human Image Animation Using Diffusion Model
Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Hanshu Yan, Jia-Wei Liu, Chenxu Zhang, Jiashi Feng, Mike Zheng Shou Make Pixels Dance: High-Dynamic Video Generation
Yan Zeng, Guoqiang Wei, Jiani Zheng, Jiaxin Zou, Yang Wei, Yuchen Zhang, Hang Li Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text
Junshu Tang, Yanhong Zeng, Ke Fan, Xuheng Wang, Bo Dai, Kai Chen, Lizhuang Ma Make-Your-Anchor: A Diffusion-Based 2D Avatar Generation Framework
Ziyao Huang, Fan Tang, Yong Zhang, Xiaodong Cun, Juan Cao, Jintao Li, Tong-Yee Lee Making Vision Transformers Truly Shift-Equivariant
Renan A. Rojas-Gomez, Teck-Yian Lim, Minh N. Do, Raymond A. Yeh Making Visual Sense of Oracle Bones for You and Me
Runqi Qiao, Lan Yang, Kaiyue Pang, Honggang Zhang ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li, Mingxu Zhang, Yiran Geng, Haoran Geng, Yuxing Long, Yan Shen, Renrui Zhang, Jiaming Liu, Hao Dong MANUS: Markerless Grasp Capture Using Articulated 3D Gaussians
Chandradeep Pokhariya, Ishaan Nikhil Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, Srinath Sridhar mAP-Relative Pose Regression for Visual Re-Localization
Shuai Chen, Tommaso Cavallari, Victor Adrian Prisacariu, Eric Brachmann MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection
Boyang Peng, Sanqing Qu, Yong Wu, Tianpei Zou, Lianghua He, Alois Knoll, Guang Chen, Changjun Jiang MAPLM: A Real-World Large-Scale Vision-Language Benchmark for mAP and Traffic Scene Understanding
Xu Cao, Tong Zhou, Yunsheng Ma, Wenqian Ye, Can Cui, Kun Tang, Zhipeng Cao, Kaizhao Liang, Ziran Wang, James M. Rehg, Chao Zheng MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling
Xuzhe Zhang, Yuhao Wu, Elsa Angelini, Ang Li, Jia Guo, Jerod M. Rasmussen, Thomas G. O'Connor, Pathik D. Wadhwa, Andrea Parolin Jackowski, Hai Li, Jonathan Posner, Andrew F. Laine, Yun Wang MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
Sadeep Jayasumana, Daniel Glasner, Srikumar Ramalingam, Andreas Veit, Ayan Chakrabarti, Sanjiv Kumar Mask Grounding for Referring Image Segmentation
Yong Xien Chng, Henry Zheng, Yizeng Han, Xuchong Qiu, Gao Huang Masked AutoDecoder Is Effective Multi-Task Vision Generalist
Han Qiu, Jiaxing Huang, Peng Gao, Lewei Lu, Xiaoqin Zhang, Shijian Lu Masked Autoencoders for Microscopy Are Scalable Learners of Cellular Biology
Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, Dominique Beaini, Maciej Sypetkowski, Chi Vicky Cheng, Kristen Morse, Maureen Makes, Ben Mabey, Berton Earnshaw MaskINT: Video Editing via Interpolative Non-Autoregressive Masked Transformers
Haoyu Ma, Shahin Mahdizadehaghdam, Bichen Wu, Zhipeng Fan, Yuchao Gu, Wenliang Zhao, Lior Shapira, Xiaohui Xie Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
Axel Barroso-Laguna, Sowmya Munukutla, Victor Adrian Prisacariu, Eric Brachmann Matching Anything by Segmenting Anything
Siyuan Li, Lei Ke, Martin Danelljan, Luigi Piccinelli, Mattia Segu, Luc Van Gool, Fisher Yu MatchU: Matching Unseen Objects for 6d Pose Estimation from RGB-D Images
Junwen Huang, Hao Yu, Kuan-Ting Yu, Nassir Navab, Slobodan Ilic, Benjamin Busam MatFuse: Controllable Material Generation with Diffusion Models
Giuseppe Vecchio, Renato Sortino, Simone Palazzo, Concetto Spampinato MaxQ: Multi-Axis Query for N:M Sparsity Network
Jingyang Xiang, Siqi Li, Junhao Chen, Zhuangzhi Chen, Tianxin Huang, Linpeng Peng, Yong Liu MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception
Thien-Minh Nguyen, Shenghai Yuan, Thien Hoang Nguyen, Pengyu Yin, Haozhi Cao, Lihua Xie, Maciej Wozniak, Patric Jensfelt, Marko Thiel, Justin Ziegenbein, Noel Blunder MCNet: Rethinking the Core Ingredients for Accurate and Efficient Homography Estimation
Haokai Zhu, Si-Yuan Cao, Jianxin Hu, Sitong Zuo, Beinan Yu, Jiacheng Ying, Junwei Li, Hui-Liang Shen MeaCap: Memory-Augmented Zero-Shot Image Captioning
Zequn Zeng, Yan Xie, Hao Zhang, Chiyu Chen, Bo Chen, Zhengjue Wang MedBN: Robust Test-Time Adaptation Against Malicious Test Samples
Hyejin Park, Jeongyeon Hwang, Sunung Mun, Sangdon Park, Jungseul Ok MeLFusion: Synthesizing Music from Image and Language Cues Using Diffusion Models
Sanjoy Chowdhury, Sayan Nag, K J Joseph, Balaji Vasan Srinivasan, Dinesh Manocha MemoNav: Working Memory Model for Visual Navigation
Hongxin Li, Zeyu Wang, Xu Yang, Yuran Yang, Shuqi Mei, Zhaoxiang Zhang Memory-Based Adapters for Online 3D Scene Perception
Xiuwei Xu, Chong Xia, Ziwei Wang, Linqing Zhao, Yueqi Duan, Jie Zhou, Jiwen Lu MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers
Yawar Siddiqui, Antonio Alliegro, Alexey Artemov, Tatiana Tommasi, Daniele Sirigatti, Vladislav Rosov, Angela Dai, Matthias Nießner MeshPose: Unifying DensePose and 3D Body Mesh Reconstruction
Eric-Tuan Le, Antonis Kakolyris, Petros Koutras, Himmy Tam, Efstratios Skordos, George Papandreou, Riza Alp Güler, Iasonas Kokkinos MGMap: Mask-Guided Learning for Online Vectorized HD mAP Construction
Xiaolu Liu, Song Wang, Wentong Li, Ruizi Yang, Junbo Chen, Jianke Zhu MICap: A Unified Model for Identity-Aware Movie Descriptions
Haran Raajesh, Naveen Reddy Desanur, Zeeshan Khan, Makarand Tapaswi MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai, Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Xiaoyan Sun, Chong Luo, Baining Guo MindBridge: A Cross-Subject Brain Decoding Framework
Shizun Wang, Songhua Liu, Zhenxiong Tan, Xinchao Wang Minimal Perspective Autocalibration
Andrea Porfiri Dal Cin, Timothy Duff, Luca Magri, Tomas Pajdla Mip-Splatting: Alias-Free 3D Gaussian Splatting
Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, Andreas Geiger Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities
Aj Piergiovanni, Isaac Noble, Dahun Kim, Michael S. Ryoo, Victor Gomes, Anelia Angelova Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning
Zihua Zhao, Mengxi Chen, Tianjie Dai, Jiangchao Yao, Bo Han, Ya Zhang, Yanfeng Wang MLP Can Be a Good Transformer Learner
Sihao Lin, Pumeng Lyu, Dongrui Liu, Tao Tang, Xiaodan Liang, Andy Song, Xiaojun Chang MM-Narrator: Narrating Long-Form Videos with Multimodal In-Context Learning
Chaoyi Zhang, Kevin Lin, Zhengyuan Yang, Jianfeng Wang, Linjie Li, Chung-Ching Lin, Zicheng Liu, Lijuan Wang MMA-Diffusion: MultiModal Attack on Diffusion Models
Yijun Yang, Ruiyuan Gao, Xiaosen Wang, Tsung-Yi Ho, Nan Xu, Qiang Xu MMA: Multi-Modal Adapter for Vision-Language Models
Lingxiao Yang, Ru-Yuan Zhang, Yanchen Wang, Xiaohua Xie MMM: Generative Masked Motion Model
Ekkasit Pinyoanuntapong, Pu Wang, Minwoo Lee, Chen Chen MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Jielin Qiu, Jiacheng Zhu, William Han, Aditesh Kumar, Karthik Mittal, Claire Jin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Ding Zhao, Bo Li, Lijuan Wang MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors
He Zhang, Shenghao Ren, Haolei Yuan, Jianhui Zhao, Fan Li, Shuangpeng Sun, Zhenghao Liang, Tao Yu, Qiu Shen, Xun Cao MobileCLIP: Fast Image-Text Models Through Multi-Modal Reinforced Training
Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel MoCha-Stereo: Motif Channel Attention Network for Stereo Matching
Ziyang Chen, Wei Long, He Yao, Yongjun Zhang, Bingshu Wang, Yongbin Qin, Jia Wu Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Tony C. W. Mok, Zi Li, Yunhao Bai, Jianpeng Zhang, Wei Liu, Yan-Jie Zhou, Ke Yan, Dakai Jin, Yu Shi, Xiaoli Yin, Le Lu, Ling Zhang Modality-Collaborative Test-Time Adaptation for Action Recognition
Baochen Xiong, Xiaoshan Yang, Yaguang Song, Yaowei Wang, Changsheng Xu MoDE: CLIP Data Experts via Clustering
Jiawei Ma, Po-Yao Huang, Saining Xie, Shang-Wen Li, Luke Zettlemoyer, Shih-Fu Chang, Wen-Tau Yih, Hu Xu Model Inversion Robustness: Can Transfer Learning Help?
Sy-Tuyen Ho, Koh Jun Hao, Keshigeyan Chandrasegaran, Ngoc-Bao Nguyen, Ngai-Man Cheung Modeling Collaborator: Enabling Subjective Vision Classification with Minimal Human Effort via LLM Tool-Use
Imad Eddine Toubal, Aditya Avinash, Neil Gordon Alldrin, Jan Dlabal, Wenlei Zhou, Enming Luo, Otilia Stretcu, Hao Xiong, Chun-Ta Lu, Howard Zhou, Ranjay Krishna, Ariel Fuxman, Tom Duerig Modular Blind Video Quality Assessment
Wen Wen, Mu Li, Yabin Zhang, Yiting Liao, Junlin Li, Li Zhang, Kede Ma MOHO: Learning Single-View Hand-Held Object Reconstruction with Multi-View Occlusion-Aware Supervision
Chenyangguang Zhang, Guanlong Jiao, Yan Di, Gu Wang, Ziqin Huang, Ruida Zhang, Fabian Manhardt, Bowen Fu, Federico Tombari, Xiangyang Ji MoMask: Generative Masked Modeling of 3D Human Motions
Chuan Guo, Yuxuan Mu, Muhammad Gohar Javed, Sen Wang, Li Cheng MoML: Online Meta Adaptation for 3D Human Motion Prediction
Xiaoning Sun, Huaijiang Sun, Bin Li, Dong Wei, Weiqing Li, Jianfeng Lu Monkey: Image Resolution and Text Label Are Important Things for Large Multi-Modal Models
Zhang Li, Biao Yang, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun, Yuliang Liu, Xiang Bai MonoCD: Monocular 3D Object Detection with Complementary Depths
Longfei Yan, Pei Yan, Shengzhou Xiong, Xuanyu Xiang, Yihua Tan Monocular Identity-Conditioned Facial Reflectance Reconstruction
Xingyu Ren, Jiankang Deng, Yuhao Cheng, Jia Guo, Chao Ma, Yichao Yan, Wenhan Zhu, Xiaokang Yang MonoHair: High-Fidelity Hair Modeling from a Monocular Video
Keyu Wu, Lingchen Yang, Zhiyi Kuang, Yao Feng, Xutao Han, Yuefan Shen, Hongbo Fu, Kun Zhou, Youyi Zheng MonoNPHM: Dynamic Head Reconstruction from Monocular Videos
Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos, Martin Rünz, Lourdes Agapito, Matthias Nießner Mosaic-SDF for 3D Generative Models
Lior Yariv, Omri Puny, Oran Gafni, Yaron Lipman MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction Using Differentiable Shading
Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got, Trevor Anderson, Amin Fadaeinejad, Rafael M. O. Cruz, Marc-André Carbonneau MoST: Multi-Modality Scene Tokenization for Motion Prediction
Norman Mu, Jingwei Ji, Zhenpei Yang, Nate Harada, Haotian Tang, Kan Chen, Charles R. Qi, Runzhou Ge, Kratarth Goel, Zoey Yang, Scott Ettinger, Rami Al-Rfou, Dragomir Anguelov, Yin Zhou Motion-Adaptive Separable Collaborative Filters for Blind Motion Deblurring
Chengxu Liu, Xuan Wang, Xiangyu Xu, Ruhao Tian, Shuai Li, Xueming Qian, Ming-Hsuan Yang MotionEditor: Editing Video Motion via Content-Aware Diffusion
Shuyuan Tu, Qi Dai, Zhi-Qi Cheng, Han Hu, Xintong Han, Zuxuan Wu, Yu-Gang Jiang Move Anything with Layered Scene Diffusion
Jiawei Ren, Mengmeng Xu, Jui-Chieh Wu, Ziwei Liu, Tao Xiang, Antoine Toisoul Move as You Say Interact as You Can: Language-Guided Human Motion Generation with Scene Affordance
Zan Wang, Yixin Chen, Baoxiong Jia, Puhao Li, Jinlu Zhang, Jingze Zhang, Tengyu Liu, Yixin Zhu, Wei Liang, Siyuan Huang MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Enxin Song, Wenhao Chai, Guanhong Wang, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang MP5: A Multi-Modal Open-Ended Embodied System in Minecraft via Active Perception
Yiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing Shao mPLUG-Owl2: Revolutionizing Multi-Modal Large Language Model with Modality Collaboration
Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang MPOD123: One Image to 3D Content Generation Using Mask-Enhanced Progressive Outline-to-Detail Optimization
Jimin Xu, Tianbao Wang, Tao Jin, Shengyu Zhang, Dongjie Fu, Zhe Wang, Jiangjing Lyu, Chengfei Lv, Chaoyue Niu, Zhou Yu, Zhou Zhao, Fei Wu MRFS: Mutually Reinforcing Image Fusion and Segmentation
Hao Zhang, Xuhui Zuo, Jie Jiang, Chunchao Guo, Jiayi Ma MS-DETR: Efficient DETR Training with Mixed Supervision
Chuyang Zhao, Yifan Sun, Wenhao Wang, Qiang Chen, Errui Ding, Yi Yang, Jingdong Wang MSU-4S - The Michigan State University Four Seasons Dataset
Daniel Kent, Mohammed Alyaqoub, Xiaohu Lu, Hamed Khatounabadi, Kookjin Sung, Cole Scheller, Alexander Dalat, Xinwei Guo, Asma bin Thabit, Roberto Whitley, Hayder Radha MuGE: Multiple Granularity Edge Detection
Caixia Zhou, Yaping Huang, Mengyang Pu, Qingji Guan, Ruoxi Deng, Haibin Ling MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
Petru-Daniel Tudosiu, Yongxin Yang, Shifeng Zhang, Fei Chen, Steven McDonagh, Gerasimos Lampouras, Ignacio Iacobacci, Sarah Parisot Multi-Level Neural Scene Graphs for Dynamic Urban Environments
Tobias Fischer, Lorenzo Porzi, Samuel Rota Bulo, Marc Pollefeys, Peter Kontschieder Multi-Modal Hallucination Control by Visual Information Grounding
Alessandro Favero, Luca Zancato, Matthew Trager, Siddharth Choudhary, Pramuditha Perera, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto Multi-Modal In-Context Learning Makes an Ego-Evolving Scene Text Recognizer
Zhen Zhao, Jingqun Tang, Chunhui Lin, Binghong Wu, Can Huang, Hao Liu, Xin Tan, Zhizhong Zhang, Yuan Xie Multi-Modal Instruction Tuned LLMs with Fine-Grained Visual Perception
Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Jin-Peng Lan, Bin Luo, Xuansong Xie Multi-Modal Learning for Geospatial Vegetation Forecasting
Vitus Benson, Claire Robin, Christian Requena-Mesa, Lazaro Alonso, Nuno Carvalhais, José Cortés, Zhihan Gao, Nora Linscheid, Mélanie Weynants, Markus Reichstein Multi-Object Tracking in the Dark
Xinzhe Wang, Kang Ma, Qiankun Liu, Yunhao Zou, Ying Fu Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning
Menghao Zhang, Jingyu Wang, Qi Qi, Haifeng Sun, Zirui Zhuang, Pengfei Ren, Ruilong Ma, Jianxin Liao Multi-Space Alignments Towards Universal LiDAR Segmentation
Youquan Liu, Lingdong Kong, Xiaoyang Wu, Runnan Chen, Xin Li, Liang Pan, Ziwei Liu, Yuexin Ma Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Yuqi Yang, Peng-Tao Jiang, Qibin Hou, Hao Zhang, Jinwei Chen, Bo Li Multi-View Attentive Contextualization for Multi-View 3D Object Detection
Xianpeng Liu, Ce Zheng, Ming Qian, Nan Xue, Chen Chen, Zhebin Zhang, Chen Li, Tianfu Wu Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
Yiming Li, Zhiheng Li, Nuo Chen, Moonjun Gong, Zonglin Lyu, Zehong Wang, Peili Jiang, Chen Feng MultiDiff: Consistent Novel View Synthesis from a Single Image
Norman Müller, Katja Schwarz, Barbara Rössle, Lorenzo Porzi, Samuel Rota Bulò, Matthias Nießner, Peter Kontschieder MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina, Massimiliano Mancini, Elia Cunegatti, Gaowen Liu, Giovanni Iacca, Elisa Ricci Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping
Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti, Luigi Di Stefano Multimodal Sense-Informed Forecasting of 3D Human Motions
Zhenyu Lou, Qiongjie Cui, Haofan Wang, Xu Tang, Hong Zhou MultiPhys: Multi-Person Physics-Aware 3D Motion Estimation
Nicolas Ugrinovic, Boxiao Pan, Georgios Pavlakos, Despoina Paschalidou, Bokui Shen, Jordi Sanchez-Riera, Francesc Moreno-Noguer, Leonidas Guibas Multiplane Prior Guided Few-Shot Aerial Scene Rendering
Zihan Gao, Licheng Jiao, Lingling Li, Xu Liu, Fang Liu, Puhua Chen, Yuwei Guo Multiple View Geometry Transformers for 3D Human Pose Estimation
Ziwei Liao, Jialiang Zhu, Chunyu Wang, Han Hu, Steven L. Waslander MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild
Zeren Jiang, Chen Guo, Manuel Kaufmann, Tianjian Jiang, Julien Valentin, Otmar Hilliges, Jie Song MuRF: Multi-Baseline Radiance Fields
Haofei Xu, Anpei Chen, Yuedong Chen, Christos Sakaridis, Yulun Zhang, Marc Pollefeys, Andreas Geiger, Fisher Yu MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval
Xiaojie Jin, Bowen Zhang, Weibo Gong, Kai Xu, Xueqing Deng, Peng Wang, Zhao Zhang, Xiaohui Shen, Jiashi Feng MVBench: A Comprehensive Multi-Modal Video Understanding Benchmark
Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Yi Liu, Zun Wang, Jilan Xu, Guo Chen, Ping Luo, Limin Wang, Yu Qiao MVHumanNet: A Large-Scale Dataset of Multi-View Daily Dressing Human Captures
Zhangyang Xiong, Chenghong Li, Kenkun Liu, Hongjie Liao, Jianqiao Hu, Junyi Zhu, Shuliang Ning, Lingteng Qiu, Chongjie Wang, Shijie Wang, Shuguang Cui, Xiaoguang Han Named Entity Driven Zero-Shot Image Manipulation
Zhida Feng, Li Chen, Jing Tian, JiaXiang Liu, Shikun Feng NAPGuard: Towards Detecting Naturalistic Adversarial Patches
Siyang Wu, Jiakai Wang, Jiejie Zhao, Yazhe Wang, Xianglong Liu Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
Shiyi Zhang, Sule Bai, Guangyi Chen, Lei Chen, Jiwen Lu, Junle Wang, Yansong Tang NARUTO: Neural Active Reconstruction from Uncertain Target Observations
Ziyue Feng, Huangying Zhan, Zheng Chen, Qingan Yan, Xiangyu Xu, Changjiang Cai, Bing Li, Qilun Zhu, Yi Xu NB-GTR: Narrow-Band Guided Turbulence Removal
Yifei Xia, Chu Zhou, Chengxuan Zhu, Minggui Teng, Chao Xu, Boxin Shi NC-TTT: A Noise Constrastive Approach for Test-Time Training
David Osowiechi, Gustavo A. Vargas Hakim, Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Moslem Yazdanpanah, Ismail Ben Ayed, Christian Desrosiers NEAT: Distilling 3D Wireframes from Neural Attraction Fields
Nan Xue, Bin Tan, Yuxi Xiao, Liang Dong, Gui-Song Xia, Tianfu Wu, Yujun Shen NECA: Neural Customizable Human Avatar
Junjin Xiao, Qing Zhang, Zhan Xu, Wei-Shi Zheng Neighbor Relations Matter in Video Scene Detection
Jiawei Tan, Hongxing Wang, Jiaxin Li, Zhilong Ou, Zhangbin Qian NeISF: Neural Incident Stokes Field for Geometry and Material Estimation
Chenhao Li, Taishi Ono, Takeshi Uemori, Hajime Mihara, Alexander Gatto, Hajime Nagahara, Yusuke Moriuchi NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs
Michael Fischer, Zhengqin Li, Thu Nguyen-Phuoc, Aljaz Bozic, Zhao Dong, Carl Marshall, Tobias Ritschel NeRF Director: Revisiting View Selection in Neural Volume Rendering
Wenhui Xiao, Rodrigo Santa Cruz, David Ahmedt-Aristizabal, Olivier Salvado, Clinton Fookes, Leo Lebrat NeRF On-the-Go: Exploiting Uncertainty for Distractor-Free NeRFs in the Wild
Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, Songyou Peng NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows
Zhenggang Tang, Zhongzheng Ren, Xiaoming Zhao, Bowen Wen, Jonathan Tremblay, Stan Birchfield, Alexander Schwing NeRFiller: Completing Scenes via Generative 3D Inpainting
Ethan Weber, Aleksander Holynski, Varun Jampani, Saurabh Saxena, Noah Snavely, Abhishek Kar, Angjoo Kanazawa NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images
Yufei Han, Heng Guo, Koki Fukai, Hiroaki Santo, Boxin Shi, Fumio Okura, Zhanyu Ma, Yunpeng Jia NetTrack: Tracking Highly Dynamic Objects with a Net
Guangze Zheng, Shijie Lin, Haobo Zuo, Changhong Fu, Jia Pan NeuRAD: Neural Rendering for Autonomous Driving
Adam Tonderski, Carl Lindström, Georg Hess, William Ljungbergh, Lennart Svensson, Christoffer Petersson Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling
Liwen Wu, Sai Bi, Zexiang Xu, Fujun Luan, Kai Zhang, Iliyan Georgiev, Kalyan Sunkavalli, Ravi Ramamoorthi Neural Implicit Morphing of Face Images
Guilherme Schardong, Tiago Novello, Hallison Paz, Iurii Medvedev, Vinícius da Silva, Luiz Velho, Nuno Gonçalves Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects
Yijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas Guibas, Stan Birchfield Neural Lineage
Runpeng Yu, Xinchao Wang Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction
Devikalyan Das, Christopher Wewer, Raza Yunus, Eddy Ilg, Jan Eric Lenssen Neural Redshift: Random Networks Are Not Random Functions
Damien Teney, Armand Mihai Nicolicioiu, Valentin Hartmann, Ehsan Abbasnejad Neural Refinement for Absolute Pose Regression with Feature Synthesis
Shuai Chen, Yash Bhalgat, Xinghui Li, Jia-Wang Bian, Kejie Li, Zirui Wang, Victor Adrian Prisacariu Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text
Vasileios Baltatzis, Rolandos Alexandros Potamias, Evangelos Ververas, Guanxiong Sun, Jiankang Deng, Stefanos Zafeiriou Neural Spline Fields for Burst Image Fusion and Layer Separation
Ilya Chugunov, David Shustin, Ruyu Yan, Chenyang Lei, Felix Heide Neural Underwater Scene Representation
Yunkai Tang, Chengxuan Zhu, Renjie Wan, Chao Xu, Boxin Shi Neural Visibility Field for Uncertainty-Driven Active Mapping
Shangjie Xue, Jesse Dill, Pranay Mathur, Frank Dellaert, Panagiotis Tsiotra, Danfei Xu NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis
Nilesh Kulkarni, Davis Rempe, Kyle Genova, Abhijit Kundu, Justin Johnson, David Fouhey, Leonidas Guibas NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation
Vikas Thamizharasan, Difan Liu, Matthew Fisher, Nanxuan Zhao, Evangelos Kalogerakis, Michal Lukac No More Ambiguity in 360deg Room Layout via Bi-Layout Estimation
Yu-Ju Tsai, Jin-Cheng Jhang, Jingjing Zheng, Wei Wang, Albert Y. C. Chen, Min Sun, Cheng-Hao Kuo, Ming-Hsuan Yang No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation
Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao Noisy One-Point Homographies Are Surprisingly Good
Yaqing Ding, Jonathan Astermark, Magnus Oskarsson, Viktor Larsson Noisy-Correspondence Learning for Text-to-Image Person Re-Identification
Yang Qin, Yingke Chen, Dezhong Peng, Xi Peng, Joey Tianyi Zhou, Peng Hu Non-Autoregressive Sequence-to-Sequence Vision-Language Models
Kunyu Shi, Qi Dong, Luis Goncalves, Zhuowen Tu, Stefano Soatto NOPE: Novel Object Pose Estimation from a Single Image
Van Nguyen Nguyen, Thibault Groueix, Georgy Ponimatkin, Yinlin Hu, Renaud Marlet, Mathieu Salzmann, Vincent Lepetit NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
Yannan He, Garvita Tiwari, Tolga Birdal, Jan Eric Lenssen, Gerard Pons-Moll NTO3D: Neural Target Object 3D Reconstruction with Segment Anything
Xiaobao Wei, Renrui Zhang, Jiarui Wu, Jiaming Liu, Ming Lu, Yandong Guo, Shanghang Zhang OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao, Zhuotao Tian, Jiaya Jia OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion
Xinyu Zhan, Lixin Yang, Yifei Zhao, Kangrui Mao, Hanlin Xu, Zenan Lin, Kailin Li, Cewu Lu Object Recognition as Next Token Prediction
Kaiyu Yue, Bor-Chun Chen, Jonas Geiping, Hengduo Li, Tom Goldstein, Ser-Nam Lim Observation-Guided Diffusion Probabilistic Models
Junoh Kang, Jinyoung Choi, Sungik Choi, Bohyung Han OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation
Jisoo Jeong, Hong Cai, Risheek Garrepalli, Jamie Menjay Lin, Munawar Hayat, Fatih Porikli ODIN: A Single Model for 2D and 3D Segmentation
Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki OHTA: One-Shot Hand Avatar via Data-Driven Implicit Priors
Xiaozheng Zheng, Chao Wen, Zhuo Su, Zeran Xu, Zhaohu Li, Yang Zhao, Zhou Xue OMG-Seg: Is One Model Good Enough for All Segmentation?
Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy OMG: Towards Open-Vocabulary Motion Generation via Mixture of Controllers
Han Liang, Jiacheng Bao, Ruichi Zhang, Sihan Ren, Yuecheng Xu, Sibei Yang, Xin Chen, Jingyi Yu, Lan Xu OmniMotionGPT: Animal Motion Generation with Limited Data
Zhangsihao Yang, Mingyuan Zhou, Mengyi Shan, Bingbing Wen, Ziwei Xuan, Mitch Hill, Junjie Bai, Guo-Jun Qi, Yalin Wang OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition
Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning
Haiyang Ying, Yixuan Yin, Jinzhi Zhang, Fan Wang, Tao Yu, Ruqi Huang, Lu Fang OmniViD: A Generative Framework for Universal Video Understanding
Junke Wang, Dongdong Chen, Chong Luo, Bo He, Lu Yuan, Zuxuan Wu, Yu-Gang Jiang On Exact Inversion of DPM-Solvers
Seongmin Hong, Kyeonghyun Lee, Suh Yoon Jeon, Hyewon Bae, Se Young Chun On Scaling up a Multilingual Vision and Language Model
Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, Aj Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut On the Content Bias in Frechet Video Distance
Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar, Jun-Yan Zhu, Jia-Bin Huang On the Scalability of Diffusion-Based Text-to-Image Generation
Hao Li, Yang Zou, Ying Wang, Orchid Majumder, Yusheng Xie, R. Manmatha, Ashwin Swaminathan, Zhuowen Tu, Stefano Ermon, Stefano Soatto On Train-Test Class Overlap and Detection for Image Retrieval
Chull Hwan Song, Jooyoung Yoon, Taebaek Hwang, Shunghyun Choi, Yeong Hyeon Gu, Yannis Avrithis One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Minghua Liu, Ruoxi Shi, Linghao Chen, Zhuoyang Zhang, Chao Xu, Xinyue Wei, Hansheng Chen, Chong Zeng, Jiayuan Gu, Hao Su One-Class Face Anti-Spoofing via Spoof Cue mAP-Guided Feature Learning
Pei-Kai Huang, Cheng-Hsuan Chiang, Tzu-Hsien Chen, Jun-Xiong Chong, Tyng-Luh Liu, Chiou-Ting Hsu One-Dimensional Adapter to Rule Them All: Concepts Diffusion Models and Erasing Applications
Mengyao Lyu, Yuhong Yang, Haiwen Hong, Hui Chen, Xuan Jin, Yuan He, Hui Xue, Jungong Han, Guiguang Ding One-Shot Structure-Aware Stylized Image Synthesis
Hansam Cho, Jonghyun Lee, Seunggyu Chang, Yonghyun Jeong One-Step Diffusion with Distribution Matching Distillation
Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Frédo Durand, William T. Freeman, Taesung Park OneFormer3D: One Transformer for Unified Point Cloud Segmentation
Maxim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich OneLLM: One Framework to Align All Modalities with Language
Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
Lingyi Hong, Shilin Yan, Renrui Zhang, Wanyun Li, Xinyu Zhou, Pinxue Guo, Kaixun Jiang, Yiting Chen, Jinglun Li, Zhaoyu Chen, Wenqiang Zhang Open-Set Domain Adaptation for Semantic Segmentation
Seun-An Choe, Ah-Hyung Shin, Keon-Hee Park, Jinwoo Choi, Gyeong-Moon Park Open-Vocabulary Object 6d Pose Estimation
Jaime Corsetti, Davide Boscaini, Changjae Oh, Andrea Cavallaro, Fabio Poiesi Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
Xiangheng Shan, Dongyue Wu, Guilin Zhu, Yuanjie Shao, Nong Sang, Changxin Gao Open-Vocabulary Video Anomaly Detection
Peng Wu, Xuerong Zhou, Guansong Pang, Yujia Sun, Jing Liu, Peng Wang, Yanning Zhang Open-World Semantic Segmentation Including Class Similarity
Matteo Sodano, Federico Magistri, Lucas Nunes, Jens Behley, Cyrill Stachniss Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
Phuc Nguyen, Tuan Duc Ngo, Evangelos Kalogerakis, Chuang Gan, Anh Tran, Cuong Pham, Khoi Nguyen OpenBias: Open-Set Bias Detection in Text-to-Image Generative Models
Moreno D'Incà, Elia Peruzzo, Massimiliano Mancini, Dejia Xu, Vidit Goel, Xingqian Xu, Zhangyang Wang, Humphrey Shi, Nicu Sebe OpenEQA: Embodied Question Answering in the Era of Foundation Models
Arjun Majumdar, Anurag Ajay, Xiaohan Zhang, Pranav Putta, Sriram Yenamandra, Mikael Henaff, Sneha Silwal, Paul Mcvay, Oleksandr Maksymets, Sergio Arnaud, Karmesh Yadav, Qiyang Li, Ben Newman, Mohit Sharma, Vincent Berges, Shiqi Zhang, Pulkit Agrawal, Yonatan Bisk, Dhruv Batra, Mrinal Kalakrishnan, Franziska Meier, Chris Paxton, Alexander Sax, Aravind Rajeswaran OpenESS: Event-Based Semantic Scene Understanding with Open Vocabularies
Lingdong Kong, Youquan Liu, Lai Xing Ng, Benoit R. Cottereau, Wei Tsang Ooi OpenStreetView-5m: The Many Roads to Global Visual Geolocation
Guillaume Astruc, Nicolas Dufour, Ioannis Siglidis, Constantin Aronssohn, Nacim Bouia, Stephanie Fu, Romain Loiseau, Van Nguyen Nguyen, Charles Raude, Elliot Vincent, Lintao Xu, Hongyu Zhou, Loic Landrieu Optimizing Diffusion Noise Can Serve as Universal Motion Priors
Korrawe Karunratanakul, Konpat Preechakul, Emre Aksan, Thabo Beeler, Supasorn Suwajanakorn, Siyu Tang OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning
Xinyu Geng, Jiaming Wang, Jiawei Gong, Yuerong Xue, Jun Xu, Fanglin Chen, Xiaolin Huang Osprey: Pixel Understanding with Visual Instruction Tuning
Yuqian Yuan, Wentong Li, Jian Liu, Dongqi Tang, Xinjie Luo, Chi Qin, Lei Zhang, Jianke Zhu Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata
Dongsu Zhang, Francis Williams, Zan Gojcic, Karsten Kreis, Sanja Fidler, Young Min Kim, Amlan Kar Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models
Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, Gang Yu PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor
Vidit Goel, Elia Peruzzo, Yifan Jiang, Dejia Xu, Xingqian Xu, Nicu Sebe, Trevor Darrell, Zhangyang Wang, Humphrey Shi PairAug: What Can Augmented Image-Text Pairs Do for Radiology?
Yutong Xie, Qi Chen, Sinuo Wang, Minh-Son To, Iris Lee, Ee Win Khoo, Kerolos Hendy, Daniel Koh, Yong Xia, Qi Wu PairDETR : Joint Detection and Association of Human Bodies and Faces
Ammar Ali, Georgii Gaikov, Denis Rybalchenko, Alexander Chigorin, Ivan Laptev, Sergey Zagoruyko Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
Yuqing Wen, Yucheng Zhao, Yingfei Liu, Fan Jia, Yanhui Wang, Chong Luo, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang Panda-70m: Captioning 70m Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Ekaterina Deyneka, Hsiang-wei Chao, Byung Eun Jeon, Yuwei Fang, Hsin-Ying Lee, Jian Ren, Ming-Hsuan Yang, Sergey Tulyakov ParamISP: Learned Forward and Inverse ISPs Using Camera Parameters
Woohyeok Kim, Geonu Kim, Junyong Lee, Seungyong Lee, Seung-Hwan Baek, Sunghyun Cho PaReNeRF: Toward Fast Large-Scale Dynamic NeRF with Patch-Based Reference
Xiao Tang, Min Yang, Penghui Sun, Hui Li, Yuchao Dai, Feng Zhu, Hojae Lee Partial-to-Partial Shape Matching with Geometric Consistency
Viktoria Ehm, Maolin Gao, Paul Roetzer, Marvin Eisenberger, Daniel Cremers, Florian Bernard Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging
Bhargav Ghanekar, Salman Siddique Khan, Pranav Sharma, Shreyas Singh, Vivek Boominathan, Kaushik Mitra, Ashok Veeraraghavan Patch2Self2: Self-Supervised Denoising on Coresets via Matrix Sketching
Shreyas Fadnavis, Agniva Chowdhury, Joshua Batson, Petros Drineas, Eleftherios Garyfallidis PEM: Prototype-Based Efficient MaskFormer for Image Segmentation
Niccolò Cavagnero, Gabriele Rosi, Claudia Cuttano, Francesca Pistilli, Marco Ciccone, Giuseppe Averta, Fabio Cermelli Perception-Oriented Video Frame Interpolation via Asymmetric Blending
Guangyang Wu, Xin Tao, Changlin Li, Wenyi Wang, Xiaohong Liu, Qingqing Zheng Permutation Equivariance of Transformers and Its Applications
Hengyuan Xu, Liyao Xiang, Hangyu Ye, Dixi Yao, Pengzhi Chu, Baochun Li Personalized Residuals for Concept-Driven Text-to-Image Generation
Cusuh Ham, Matthew Fisher, James Hays, Nicholas Kolkin, Yuchen Liu, Richard Zhang, Tobias Hinz PFStorer: Personalized Face Restoration and Super-Resolution
Tuomas Varanka, Tapani Toivonen, Soumya Tripathy, Guoying Zhao, Erman Acar PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Tianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, Chenfanfu Jiang Physical Backdoor: Towards Temperature-Based Backdoor Attacks in the Physical World
Wen Yin, Jian Lou, Pan Zhou, Yulai Xie, Dan Feng, Yuhua Sun, Tailai Zhang, Lichao Sun Physical Property Understanding from Language-Embedded Feature Fields
Albert J. Zhai, Yuan Shen, Emily Y. Chen, Gloria X. Wang, Xinlei Wang, Sheng Wang, Kaiyu Guan, Shenlong Wang PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion
Ying-Tian Liu, Yuan-Chen Guo, Guan Luo, Heyi Sun, Wei Yin, Song-Hai Zhang Pick-or-Mix: Dynamic Channel Sampling for ConvNets
Ashish Kumar, Daneul Kim, Jaesik Park, Laxmidhar Behera PICTURE: PhotorealistIC Virtual Try-on from UnconstRained dEsigns
Shuliang Ning, Duomin Wang, Yipeng Qin, Zirong Jin, Baoyuan Wang, Xiaoguang Han PIE-NeRF: Physics-Based Interactive Elastodynamics with NeRF
Yutao Feng, Yintong Shang, Xuan Li, Tianjia Shao, Chenfanfu Jiang, Yin Yang PIGEON: Predicting Image Geolocations
Lukas Haas, Michal Skreta, Silas Alberti, Chelsea Finn PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Marina Neseem, Conor McCullough, Randy Hsin, Chas Leichner, Shan Li, In Suk Chong, Andrew Howard, Lukasz Lew, Sherief Reda, Ville-Mikko Rautio, Daniele Moro PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Michael Dorkenwald, Nimrod Barazani, Cees G. M. Snoek, Yuki M. Asano Pix2gestalt: Amodal Segmentation by Synthesizing Wholes
Ege Ozguroglu, Ruoshi Liu, Dídac Surís, Dian Chen, Achal Dave, Pavel Tokmakov, Carl Vondrick Pixel-Aligned Language Model
Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid PixelLM: Pixel Reasoning with Large Multimodal Model
Zhongwei Ren, Zhicheng Huang, Yunchao Wei, Yao Zhao, Dongmei Fu, Jiashi Feng, Xiaojie Jin PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling
Xiaoyun Zheng, Liwei Liao, Xufeng Li, Jianbo Jiao, Rongjie Wang, Feng Gao, Shiqi Wang, Ronggang Wang PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce LiDAR
Tzofi Klinghoffer, Xiaoyu Xiang, Siddharth Somasundaram, Yuchen Fan, Christian Richardt, Ramesh Raskar, Rakesh Ranjan PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment
Tianchen Deng, Guole Shen, Tong Qin, Jianyu Wang, Wentao Zhao, Jingchuan Wang, Danwei Wang, Weidong Chen Plug-and-Play Diffusion Distillation
Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte, Wei-An Lin, Hui Qu, Mingi Kwon, Ratheesh Kalarot Point Cloud Pre-Training with Diffusion Models
Xiao Zheng, Xiaoshui Huang, Guofeng Mei, Yuenan Hou, Zhaoyang Lyu, Bo Dai, Wanli Ouyang, Yongshun Gong Point Segment and Count: A Generalized Framework for Object Counting
Zhizhong Huang, Mingliang Dai, Yi Zhang, Junping Zhang, Hongming Shan Point Transformer V3: Simpler Faster Stronger
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, Hengshuang Zhao Point-VOS: Pointing up Video Object Segmentation
Sabarinath Mahadevan, Idil Esen Zulfikar, Paul Voigtlaender, Bastian Leibe PointBeV: A Sparse Approach for BeV Predictions
Loick Chambon, Eloi Zablocki, Mickaël Chen, Florent Bartoccioni, Patrick Pérez, Matthieu Cord PointInfinity: Resolution-Invariant Point Diffusion Models
Zixuan Huang, Justin Johnson, Shoubhik Debnath, James M. Rehg, Chao-Yuan Wu Poly Kernel Inception Network for Remote Sensing Detection
Xinhao Cai, Qiuxia Lai, Yuwei Wang, Wenguan Wang, Zeren Sun, Yazhou Yao PoNQ: A Neural QEM-Based Mesh Representation
Nissim Maruani, Maks Ovsjanikov, Pierre Alliez, Mathieu Desbrun POPDG: Popular 3D Dance Generation with PopDanceSet
Zhenye Luo, Min Ren, Xuecai Hu, Yongzhen Huang, Li Yao PortraitBooth: A Versatile Portrait Model for Fast Identity-Preserved Personalization
Xu Peng, Junwei Zhu, Boyuan Jiang, Ying Tai, Donghao Luo, Jiangning Zhang, Wei Lin, Taisong Jin, Chengjie Wang, Rongrong Ji Pose Adapted Shape Learning for Large-Pose Face Reenactment
Gee-Sern Jison Hsu, Jie-Ying Zhang, Huang Yu Hsiang, Wei-Jie Hong Positive-Unlabeled Learning by Latent Group-Aware Meta Disambiguation
Lin Long, Haobo Wang, Zhijie Jiang, Lei Feng, Chang Yao, Gang Chen, Junbo Zhao Posterior Distillation Sampling
Juil Koo, Chanho Park, Minhyuk Sung PostureHMR: Posture Transformation for 3D Human Mesh Recovery
Yu-Pei Song, Xiao Wu, Zhaoquan Yuan, Jian-Jun Qiao, Qiang Peng Practical Measurements of Translucent Materials with Inter-Pixel Translucency Prior
Zhenyu Chen, Jie Guo, Shuichang Lai, Ruoyu Fu, Mengxun Kong, Chen Wang, Hongyu Sun, Zhebin Zhang, Chen Li, Yanwen Guo PREGO: Online Mistake Detection in PRocedural EGOcentric Videos
Alessandro Flaborea, Guido Maria D'Amely di Melendugno, Leonardo Plini, Luca Scofano, Edoardo De Matteis, Antonino Furnari, Giovanni Maria Farinella, Fabio Galasso Previously on ... from Recaps to Story Summarization
Aditya Kumar Singh, Dhruv Srivastava, Makarand Tapaswi Privacy-Preserving Face Recognition Using Trainable Feature Subtraction
Yuxi Mi, Zhizhou Zhong, Yuge Huang, Jiazhen Ji, Jianqing Xu, Jun Wang, Shaoming Wang, Shouhong Ding, Shuigeng Zhou Probing Synergistic High-Order Interaction in Infrared and Visible Image Fusion
Naishan Zheng, Man Zhou, Jie Huang, Junming Hou, Haoying Li, Yuan Xu, Feng Zhao Probing the 3D Awareness of Visual Foundation Models
Mohamed El Banani, Amit Raj, Kevis-Kokitsi Maninis, Abhishek Kar, Yuanzhen Li, Michael Rubinstein, Deqing Sun, Leonidas Guibas, Justin Johnson, Varun Jampani Producing and Leveraging Online mAP Uncertainty in Trajectory Prediction
Xunjiang Gu, Guanyu Song, Igor Gilitschenski, Marco Pavone, Boris Ivanovic Programmable Motion Generation for Open-Set Motion Control Tasks
Hanchao Liu, Xiaohang Zhan, Shaoli Huang, Tai-Jiang Mu, Ying Shan Projecting Trackable Thermal Patterns for Dynamic Computer Vision
Mark Sheinin, Aswin C. Sankaranarayanan, Srinivasa G. Narasimhan ProMark: Proactive Diffusion Watermarking for Causal Attribution
Vishal Asnani, John Collomosse, Tu Bui, Xiaoming Liu, Shruti Agarwal ProMotion: Prototypes as Motion Learners
Yawen Lu, Dongfang Liu, Qifan Wang, Cheng Han, Yiming Cui, Zhiwen Cao, Xueling Zhang, Yingjie Victor Chen, Heng Fan Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Yuechen Zhang, Shengju Qian, Bohao Peng, Shu Liu, Jiaya Jia Prompt Learning via Meta-Regularization
Jinyoung Park, Juyeon Ko, Hyunwoo J. Kim Prompt-Driven Referring Image Segmentation with Instance Contrasting
Chao Shang, Zichen Song, Heqian Qiu, Lanxiao Wang, Fanman Meng, Hongliang Li Prompt-Free Diffusion: Taking "Text" Out of Text-to-Image Diffusion Models
Xingqian Xu, Jiayi Guo, Zhangyang Wang, Gao Huang, Irfan Essa, Humphrey Shi Prompt3D: Random Prompt Assisted Weakly-Supervised 3D Object Detection
Xiaohong Zhang, Huisheng Ye, Jingwen Li, Qinyu Tang, Yuanqi Li, Yanwen Guo, Jie Guo Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences
Minyoung Hwang, Luca Weihs, Chanwoo Park, Kimin Lee, Aniruddha Kembhavi, Kiana Ehsani PromptAD: Learning Prompts with Only Normal Samples for Few-Shot Anomaly Detection
Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, Lizhuang Ma PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought
Junyi Yao, Yijiang Liu, Zhen Dong, Mingfei Guo, Helan Hu, Kurt Keutzer, Li Du, Daquan Zhou, Shanghang Zhang Prompting Vision Foundation Models for Pathology Image Analysis
Chong Yin, Siqi Liu, Kaiyang Zhou, Vincent Wai-Sun Wong, Pong C. Yuen PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
Zheng Li, Xiang Li, Xinyi Fu, Xin Zhang, Weiqiang Wang, Shuo Chen, Jian Yang ProS: Prompting-to-Simulate Generalized Knowledge for Universal Cross-Domain Retrieval
Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation
Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Jialin Yue, Juming Xiong, Lining Yu, Yifei Wu, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-Dataset 3D Object Detection
Zhanwei Zhang, Minghao Chen, Shuai Xiao, Liang Peng, Hengjia Li, Binbin Lin, Ping Li, Wenxiao Wang, Boxi Wu, Deng Cai PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
Kun Yuan, Hongbo Liu, Mading Li, Muyi Sun, Ming Sun, Jiachao Gong, Jinhua Hao, Chao Zhou, Yansong Tang PTQ4SAM: Post-Training Quantization for Segment Anything
Chengtao Lv, Hong Chen, Jinyang Guo, Yifu Ding, Xianglong Liu Purified and Unified Steganographic Network
Guobiao Li, Sheng Li, Zicong Luo, Zhenxing Qian, Xinpeng Zhang Putting the Object Back into Video Object Segmentation
Ho Kei Cheng, Seoung Wug Oh, Brian Price, Joon-Young Lee, Alexander Schwing Q-Instruct: Improving Low-Level Visual Abilities for Multi-Modality Foundation Models
Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Kaixin Xu, Chunyi Li, Jingwen Hou, Guangtao Zhai, Geng Xue, Wenxiu Sun, Qiong Yan, Weisi Lin QUADify: Extracting Meshes with Pixel-Level Details and Materials from Images
Maximilian Frühauf, Hayko Riemenschneider, Markus Gross, Christopher Schroers Question Aware Vision Transformer for Multimodal Reasoning
Roy Ganz, Yair Kittenplon, Aviad Aberdam, Elad Ben Avraham, Oren Nuriel, Shai Mazor, Ron Litman RAM-Avatar: Real-Time Photo-Realistic Avatar from Monocular Videos with Full-Body Control
Xiang Deng, Zerong Zheng, Yuxiang Zhang, Jingxiang Sun, Chao Xu, Xiaodong Yang, Lizhen Wang, Yebin Liu Random Entangled Tokens for Adversarially Robust Vision Transformer
Huihui Gong, Minjing Dong, Siqi Ma, Seyit Camtepe, Surya Nepal, Chang Xu Rapid 3D Model Generation with Intuitive 3D Input
Tianrun Chen, Chaotao Ding, Shangzhan Zhang, Chunan Yu, Ying Zang, Zejian Li, Sida Peng, Lingyun Sun RCBEVDet: Radar-Camera Fusion in Bird's Eye View for 3D Object Detection
Zhiwei Lin, Zhe Liu, Zhongyu Xia, Xinhao Wang, Yongtao Wang, Shengxiang Qi, Yang Dong, Nan Dong, Le Zhang, Ce Zhu RCL: Reliable Continual Learning for Unified Failure Detection
Fei Zhu, Zhen Cheng, Xu-Yao Zhang, Cheng-Lin Liu, Zhaoxiang Zhang RCooper: A Real-World Large-Scale Dataset for Roadside Cooperative Perception
Ruiyang Hao, Siqi Fan, Yingru Dai, Zhenlin Zhang, Chenxi Li, Yuntian Wang, Haibao Yu, Wenxian Yang, Jirui Yuan, Zaiqing Nie REACTO: Reconstructing Articulated Objects from a Single Video
Chaoyue Song, Jiacheng Wei, Chuan Sheng Foo, Guosheng Lin, Fayao Liu Readout Guidance: Learning Control from Diffusion Features
Grace Luo, Trevor Darrell, Oliver Wang, Dan B Goldman, Aleksander Holynski Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection
Chengjie Wang, Wenbing Zhu, Bin-Bin Gao, Zhenye Gan, Jiangning Zhang, Zhihao Gu, Shuguang Qian, Mingang Chen, Lizhuang Ma Real-Time 3D-Aware Portrait Video Relighting
Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen, Yu-Kun Lai, Hongbo Fu, Boxin Shi, Lin Gao Real-Time Neural BRDF with Spherically Distributed Primitives
Yishun Dou, Zhong Zheng, Qiaoqiao Jin, Bingbing Ni, Yugang Chen, Junxiang Ke Real-Time Simulated Avatar from Head-Mounted Sensors
Zhengyi Luo, Jinkun Cao, Rawal Khirodkar, Alexander Winkler, Kris Kitani, Weipeng Xu Real-World Efficient Blind Motion Deblurring via Blur Pixel Discretization
Insoo Kim, Jae Seok Choi, Geonseok Seo, Kinam Kwon, Jinwoo Shin, Hyong-Euk Lee RecDiffusion: Rectangling for Image Stitching with Diffusion Models
Tianhao Zhou, Haipeng Li, Ziyi Wang, Ao Luo, Chen-Lin Zhang, Jiajun Li, Bing Zeng, Shuaicheng Liu ReconFusion: 3D Reconstruction with Diffusion Priors
Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P. Srinivasan, Dor Verbin, Jonathan T. Barron, Ben Poole, Aleksander Ho?y?ski Reconstructing Hands in 3D with Transformers
Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, Jitendra Malik Referring Expression Counting
Siyang Dai, Jun Liu, Ngai-Man Cheung ReGenNet: Towards Human Action-Reaction Synthesis
Liang Xu, Yizhou Zhou, Yichao Yan, Xin Jin, Wenhan Zhu, Fengyun Rao, Xiaokang Yang, Wenjun Zeng Region-Based Representations Revisited
Michal Shlapentokh-Rothman, Ansel Blume, Yao Xiao, Yuqun Wu, Sethuraman Tv, Heyi Tao, Jae Yong Lee, Wilfredo Torres, Yu-Xiong Wang, Derek Hoiem RegionGPT: Towards Region Understanding Vision Language Model
Qiushan Guo, Shalini De Mello, Hongxu Yin, Wonmin Byeon, Ka Chun Cheung, Yizhou Yu, Ping Luo, Sifei Liu Regressor-Segmenter Mutual Prompt Learning for Crowd Counting
Mingyue Guo, Li Yuan, Zhaoyi Yan, Binghui Chen, Yaowei Wang, Qixiang Ye RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
Ming Yan, Yan Zhang, Shuqiang Cai, Shuqi Fan, Xincheng Lin, Yudi Dai, Siqi Shen, Chenglu Wen, Lan Xu, Yuexin Ma, Cheng Wang Relightable and Animatable Neural Avatar from Sparse-View Video
Zhen Xu, Sida Peng, Chen Geng, Linzhan Mou, Zihan Yan, Jiaming Sun, Hujun Bao, Xiaowei Zhou Relightable Gaussian Codec Avatars
Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, Giljoo Nam Relightful Harmonization: Lighting-Aware Portrait Background Replacement
Mengwei Ren, Wei Xiong, Jae Shin Yoon, Zhixin Shu, Jianming Zhang, HyunJoon Jung, Guido Gerig, He Zhang RepAn: Enhanced Annealing Through Re-Parameterization
Xiang Fei, Xiawu Zheng, Yan Wang, Fei Chao, Chenglin Wu, Liujuan Cao Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, Konrad Schindler RepViT: Revisiting Mobile CNN from ViT Perspective
Ao Wang, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding Residual Denoising Diffusion Models
Jiawei Liu, Qiang Wang, Huijie Fan, Yinong Wang, Yandong Tang, Liangqiong Qu Residual Learning in Diffusion Models
Junyu Zhang, Daochang Liu, Eunbyung Park, Shichao Zhang, Chang Xu Resolution Limit of Single-Photon LiDAR
Stanley H. Chan, Hashan K. Weerasooriya, Weijian Zhang, Pamela Abshire, Istvan Gyongy, Robert K. Henderson Resource-Efficient Transformer Pruning for Finetuning of Large Models
Fatih Ilhan, Gong Su, Selim Furkan Tekin, Tiansheng Huang, Sihao Hu, Ling Liu Restoration by Generation with Constrained Priors
Zheng Ding, Xuaner Zhang, Zhuowen Tu, Zhihao Xia Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning
Dipam Goswami, Albin Soutif-Cormerais, Yuyang Liu, Sandesh Kamath, Bart?omiej Twardowski, Joost van de Weijer Rethinking Boundary Discontinuity Problem for Oriented Object Detection
Hang Xu, Xinyuan Liu, Haonan Xu, Yike Ma, Zunjie Zhu, Chenggang Yan, Feng Dai Rethinking Diffusion Model for Multi-Contrast MRI Super-Resolution
Guangyuan Li, Chen Rao, Juncheng Mo, Zhanjie Zhang, Wei Xing, Lei Zhao Rethinking Few-Shot 3D Point Cloud Semantic Segmentation
Zhaochong An, Guolei Sun, Yun Liu, Fayao Liu, Zongwei Wu, Dan Wang, Luc Van Gool, Serge Belongie Rethinking FID: Towards a Better Evaluation Metric for Image Generation
Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit, Daniel Glasner, Ayan Chakrabarti, Sanjiv Kumar Rethinking Human Motion Prediction with Symplectic Integral
Haipeng Chen, Kedi Lyu, Zhenguang Liu, Yifang Yin, Xun Yang, Yingda Lyu Rethinking the Evaluation Protocol of Domain Generalization
Han Yu, Xingxuan Zhang, Renzhe Xu, Jiashuo Liu, Yue He, Peng Cui Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data
Xinting Liao, Weiming Liu, Chaochao Chen, Pengyang Zhou, Fengyuan Yu, Huabin Zhu, Binhui Yao, Tao Wang, Xiaolin Zheng, Yanchao Tan Rethinking Transformers Pre-Training for Multi-Spectral Satellite Imagery
Mubashir Noman, Muzammal Naseer, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan Retraining-Free Model Quantization via One-Shot Weight-Coupling Learning
Chen Tang, Yuan Meng, Jiacheng Jiang, Shuzhao Xie, Rongwei Lu, Xinzhu Ma, Zhi Wang, Wenwu Zhu Retrieval-Augmented Egocentric Video Captioning
Jilan Xu, Yifei Huang, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie Retrieval-Augmented Embodied Agents
Yichen Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Daichi Horita, Naoto Inoue, Kotaro Kikuchi, Kota Yamaguchi, Kiyoharu Aizawa Retrieval-Augmented Open-Vocabulary Object Detection
Jooyeon Kim, Eulrang Cho, Sehyung Kim, Hyunwoo J. Kim Revisiting Adversarial Training at Scale
Zeyu Wang, Xianhang Li, Hongru Zhu, Cihang Xie Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Zanlin Ni, Yulin Wang, Renping Zhou, Jiayi Guo, Jinyi Hu, Zhiyuan Liu, Shiji Song, Yuan Yao, Gao Huang Revisiting Single Image Reflection Removal in the Wild
Yurui Zhu, Xueyang Fu, Peng-Tao Jiang, Hao Zhang, Qibin Sun, Jinwei Chen, Zheng-Jun Zha, Bo Li Revisiting the Domain Shift and Sample Uncertainty in Multi-Source Active Domain Transfer
Wenqiao Zhang, Zheqi Lv, Hao Zhou, Jia-Wei Liu, Juncheng Li, Mengze Li, Yunfei Li, Dongping Zhang, Yueting Zhuang, Siliang Tang Rewrite the Stars
Xu Ma, Xiyang Dai, Yue Bai, Yizhou Wang, Yun Fu Rich Human Feedback for Text-to-Image Generation
Youwei Liang, Junfeng He, Gang Li, Peizhao Li, Arseniy Klimovskiy, Nicholas Carolan, Jiao Sun, Jordi Pont-Tuset, Sarah Young, Feng Yang, Junjie Ke, Krishnamurthy Dj Dvijotham, Katherine M. Collins, Yiwen Luo, Yang Li, Kai J Kohlhoff, Deepak Ramachandran, Vidhya Navalpakkam RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D
Lingteng Qiu, Guanying Chen, Xiaodong Gu, Qi Zuo, Mutian Xu, Yushuang Wu, Weihao Yuan, Zilong Dong, Liefeng Bo, Xiaoguang Han Riemannian Multinomial Logistics Regression for SPD Neural Networks
Ziheng Chen, Yue Song, Gaowen Liu, Ramana Rao Kompella, Xiao-Jun Wu, Nicu Sebe RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation
Zeyuan Yang, Jiageng Liu, Peihao Chen, Anoop Cherian, Tim K. Marks, Jonathan Le Roux, Chuang Gan RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-Grained Correctional Human Feedback
Tianyu Yu, Yuan Yao, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, Tat-Seng Chua RMT: Retentive Networks Meet Vision Transformers
Qihang Fan, Huaibo Huang, Mingrui Chen, Hongmin Liu, Ran He RNb-NeuS: Reflectance and Normal-Based Multi-View 3D Reconstruction
Baptiste Brument, Robin Bruneau, Yvain Quéau, Jean Mélou, François Bernard Lauze, Jean-Denis Durou, Lilian Calvet Robust Depth Enhancement via Polarization Prompt Fusion Tuning
Kei Ikemura, Yiming Huang, Felix Heide, Zhaoxiang Zhang, Qifeng Chen, Chenyang Lei Robust Emotion Recognition in Context Debiasing
Dingkang Yang, Kun Yang, Mingcheng Li, Shunli Wang, Shuaibing Wang, Lihua Zhang Robust Image Denoising Through Adversarial Frequency Mixup
Donghun Ryou, Inju Ha, Hyewon Yoo, Dongwan Kim, Bohyung Han Robust Synthetic-to-Real Transfer for Stereo Matching
Jiawei Zhang, Jiahe Li, Lei Huang, Xiaohan Yu, Lin Gu, Jin Zheng, Xiao Bai RobustSAM: Segment Anything Robustly on Degraded Images
Wei-Ting Chen, Yu-Jiet Vong, Sy-Yen Kuo, Sizhou Ma, Jian Wang RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
Yufan Chen, Jiaming Zhang, Kunyu Peng, Junwei Zheng, Ruiping Liu, Philip Torr, Rainer Stiefelhagen RoHM: Robust Human Motion Reconstruction via Diffusion
Siwei Zhang, Bharat Lal Bhatnagar, Yuanlu Xu, Alexander Winkler, Petr Kadlecek, Siyu Tang, Federica Bogo RoMa: Robust Dense Feature Matching
Johan Edstedt, Qiyu Sun, Georg Bökman, Mårten Wadenbäck, Michael Felsberg Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Sihan Liu, Yiwei Ma, Xiaoqing Zhang, Haowei Wang, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji Rotation-Agnostic Image Representation Learning for Digital Pathology
Saghir Alfasly, Abubakr Shafique, Peyman Nejat, Jibran Khan, Areej Alsaafin, Ghazal Alabtah, H.R. Tizhoosh RTracker: Recoverable Tracking via PN Tree Structured Memory
Yuqing Huang, Xin Li, Zikun Zhou, Yaowei Wang, Zhenyu He, Ming-Hsuan Yang S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes
Xingyi Li, Zhiguo Cao, Yizheng Wu, Kewei Wang, Ke Xian, Zhe Wang, Guosheng Lin SAI3D: Segment Any Instance in 3D Scenes
Yingda Yin, Yuzheng Liu, Yang Xiao, Daniel Cohen-Or, Jingwei Huang, Baoquan Chen SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Tongtian Yue, Jie Cheng, Longteng Guo, Xingyuan Dai, Zijia Zhao, Xingjian He, Gang Xiong, Yisheng Lv, Jing Liu Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, Bo Dai Scaled Decoupled Distillation
Shicai Wei, Chunbo Luo, Yang Luo Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion
Lucas Nunes, Rodrigo Marcuzzi, Benedikt Mersch, Jens Behley, Cyrill Stachniss Scaling Laws for Data Filtering-- Data Curation Cannot Be Compute Agnostic
Sachin Goyal, Pratyush Maini, Zachary C. Lipton, Aditi Raghunathan, J. Zico Kolter Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan, Kaifeng Chen, Dilip Krishnan, Dina Katabi, Phillip Isola, Yonglong Tian Scaling up Dynamic Human-Scene Interaction Modeling
Nan Jiang, Zhiyuan Zhang, Hongjie Li, Xiaoxuan Ma, Zan Wang, Yixin Chen, Tengyu Liu, Yixin Zhu, Siyuan Huang Scaling up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration in the Wild
Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, Chao Dong Scaling up Video Summarization Pretraining with Large Language Models
Dawit Mureja Argaw, Seunghyun Yoon, Fabian Caba Heilbron, Hanieh Deilamsalehy, Trung Bui, Zhaowen Wang, Franck Dernoncourt, Joon Son Chung SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
Alexandros Delitzas, Ayca Takmaz, Federico Tombari, Robert Sumner, Marc Pollefeys, Francis Engelmann SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Dave Zhenyu Chen, Haoxuan Li, Hsin-Ying Lee, Sergey Tulyakov, Matthias Nießner SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation
Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu, Wenxuan Peng, Youngsik Yun, Andrew Hundt, Jihie Kim, Jean Oh Score-Guided Diffusion for 3D Human Recovery
Anastasis Stathopoulos, Ligong Han, Dimitris Metaxas Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior
Cheng Chen, Xiaofeng Yang, Fan Yang, Chengzeng Feng, Zhoujie Fu, Chuan-Sheng Foo, Guosheng Lin, Fayao Liu SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation
Sichen Chen, Yingyi Zhang, Siming Huang, Ran Yi, Ke Fan, Ruixin Zhang, Peixian Chen, Jun Wang, Shouhong Ding, Lizhuang Ma SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
Xiaojun Hou, Jiazheng Xing, Yijie Qian, Yaowei Guo, Shuo Xin, Junhao Chen, Kai Tang, Mengmeng Wang, Zhengkai Jiang, Liang Liu, Yong Liu SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation
Yamei Chen, Yan Di, Guangyao Zhai, Fabian Manhardt, Chenyangguang Zhang, Ruida Zhang, Federico Tombari, Nassir Navab, Benjamin Busam SeD: Semantic-Aware Discriminator for Image Super-Resolution
Bingchen Li, Xin Li, Hanxin Zhu, Yeying Jin, Ruoyu Feng, Zhizheng Zhang, Zhibo Chen See Say and Segment: Teaching LMMs to Overcome False Premises
Tsung-Han Wu, Giscard Biamby, David Chan, Lisa Dunlap, Ritwik Gupta, Xudong Wang, Joseph E. Gonzalez, Trevor Darrell SEED-Bench: Benchmarking Multimodal Large Language Models
Bohao Li, Yuying Ge, Yixiao Ge, Guangzhi Wang, Rui Wang, Ruimao Zhang, Ying Shan Seeing Motion at Nighttime with an Event Camera
Haoyue Liu, Shihan Peng, Lin Zhu, Yi Chang, Hanyu Zhou, Luxin Yan Seeing the Unseen: Visual Common Sense for Semantic Placement
Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra, Zsolt Kira, Kuo-Hao Zeng, Luca Weihs Seeing the World Through Your Eyes
Hadi Alzayer, Kevin Zhang, Brandon Feng, Christopher A. Metzler, Jia-Bin Huang SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution
Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, Lei Zhang Seg2Reg: Differentiable 2D Segmentation to 1d Regression Rendering for 360 Room Layout Reconstruction
Cheng Sun, Wei-En Tai, Yu-Lin Shih, Kuan-Wei Chen, Yong-Jing Syu, Kent Selwyn The, Yu-Chiang Frank Wang, Hwann-Tzong Chen Segment and Caption Anything
Xiaoke Huang, Jianfeng Wang, Yansong Tang, Zheng Zhang, Han Hu, Jiwen Lu, Lijuan Wang, Zicheng Liu Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens
Zhiwen Chen, Zhiyu Zhu, Yifan Zhang, Junhui Hou, Guangming Shi, Jinjian Wu Segment Every Out-of-Distribution Object
Wenjie Zhao, Jia Li, Xin Dong, Yu Xiang, Yunhui Guo Selective Nonlinearities Removal from Digital Signals
Krzysztof A. Maliszewski, Magdalena A. Urba?ska, Varvara Vetrova, Sylwia M. Kolenderska Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution
Qingping Zheng, Ling Zheng, Yuanfan Guo, Ying Li, Songcen Xu, Jiankang Deng, Hang Xu Self-Correcting LLM-Controlled Diffusion Models
Tsung-Han Wu, Long Lian, Joseph E. Gonzalez, Boyi Li, Trevor Darrell Self-Distilled Masked Auto-Encoders Are Efficient Video Anomaly Detectors
Nicolae-C?t?lin Ristea, Florinel-Alin Croitoru, Radu Tudor Ionescu, Marius Popescu, Fahad Shahbaz Khan, Mubarak Shah Self-Supervised Debiasing Using Low Rank Regularization
Geon Yeong Park, Chanyong Jung, Sangmin Lee, Jong Chul Ye, Sang Wan Lee Self-Supervised Dual Contouring
Ramana Sundararaman, Roman Klokov, Maks Ovsjanikov Self-Supervised Multi-Object Tracking with Path Consistency
Zijia Lu, Bing Shuai, Yanbei Chen, Zhenlin Xu, Davide Modolo Self-Supervised Representation Learning from Arbitrary Scenarios
Zhaowen Li, Yousong Zhu, Zhiyang Chen, Zongxin Gao, Rui Zhao, Chaoyang Zhao, Ming Tang, Jinqiao Wang SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
Yuanhui Huang, Wenzhao Zheng, Borui Zhang, Jie Zhou, Jiwen Lu Semantic Human Mesh Reconstruction with Textures
Xiaoyu Zhan, Jianxin Yang, Yuanqi Li, Jie Guo, Yanwen Guo, Wenping Wang Semantic Line Combination Detector
Jinwon Ko, Dongkwon Jin, Chang-Su Kim Semantic-Aware SAM for Point-Prompted Instance Segmentation
Zhaoyang Wei, Pengfei Chen, Xuehui Yu, Guorong Li, Jianbin Jiao, Zhenjun Han Semantics-Aware Motion Retargeting with Vision-Language Models
Haodong Zhang, Zhike Chen, Haocheng Xu, Lei Hao, Xiaofei Wu, Songcen Xu, Zhensong Zhang, Yue Wang, Rong Xiong SemCity: Semantic Scene Generation with Triplane Diffusion
Jumin Lee, Sebin Lee, Changho Jo, Woobin Im, Juhyeong Seon, Sung-Eui Yoon SeMoLi: What Moves Together Belongs Together
Jenny Seidenschwarz, Aljosa Osep, Francesco Ferroni, Simon Lucey, Laura Leal-Taixe Sequential Modeling Enables Scalable Learning for Large Vision Models
Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan L. Yuille, Trevor Darrell, Jitendra Malik, Alexei A. Efros SFOD: Spiking Fusion Object Detector
Yimeng Fan, Wei Zhang, Changsong Liu, Mingyang Li, Wenrui Lu SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation
Junyan Ye, Qiyan Luo, Jinhua Yu, Huaping Zhong, Zhimeng Zheng, Conghui He, Weijia Li Shadow Generation for Composite Image Using Diffusion Model
Qingyang Liu, Junqi You, Jianting Wang, Xinhao Tao, Bo Zhang, Li Niu Shadow-Enlightened Image Outpainting
Hang Yu, Ruilin Li, Shaorong Xie, Jiayan Qiu ShapeMatcher: Self-Supervised Joint Shape Canonicalization Segmentation Retrieval and Deformation
Yan Di, Chenyangguang Zhang, Chaowei Wang, Ruida Zhang, Guangyao Zhai, Yanyan Li, Bowen Fu, Xiangyang Ji, Shan Gao Sheared Backpropagation for Fine-Tuning Foundation Models
Zhiyuan Yu, Li Shen, Liang Ding, Xinmei Tian, Yixin Chen, Dacheng Tao SHiNe: Semantic Hierarchy Nexus for Open-Vocabulary Object Detection
Mingxuan Liu, Tyler L. Hayes, Elisa Ricci, Gabriela Csurka, Riccardo Volpi SHINOBI: Shape and Illumination Using Neural Object Decomposition via BRDF Optimization In-the-Wild
Andreas Engelhardt, Amit Raj, Mark Boss, Yunzhi Zhang, Abhishek Kar, Yuanzhen Li, Deqing Sun, Ricardo Martin Brualla, Jonathan T. Barron, Hendrik P. A. Lensch, Varun Jampani SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology
Saarthak Kapse, Pushpak Pati, Srijan Das, Jingwei Zhang, Chao Chen, Maria Vakalopoulou, Joel Saltz, Dimitris Samaras, Rajarsi R. Gupta, Prateek Prasanna Sieve: Multimodal Dataset Pruning Using Image Captioning Models
Anas Mahmoud, Mostafa Elhoushi, Amro Abbas, Yu Yang, Newsha Ardalani, Hugh Leather, Ari S. Morcos SignGraph: A Sign Sequence Is Worth Graphs of Nodes
Shiwei Gan, Yafeng Yin, Zhiwei Jiang, Hongkai Wen, Lei Xie, Sanglu Lu Simple Semantic-Aided Few-Shot Learning
Hai Zhang, Junzhe Xu, Shanlin Jiang, Zhenan He Single View Refractive Index Tomography with Neural Fields
Brandon Zhao, Aviad Levis, Liam Connor, Pratul P. Srinivasan, Katherine L. Bouman Single-Model and Any-Modality for Video Object Tracking
Zongwei Wu, Jilai Zheng, Xiangxuan Ren, Florin-Alexandru Vasluianu, Chao Ma, Danda Pani Paudel, Luc Van Gool, Radu Timofte Single-View Scene Point Cloud Human Grasp Generation
Yan-Kang Wang, Chengyi Xing, Yi-Lin Wei, Xiao-Ming Wu, Wei-Shi Zheng SinSR: Diffusion-Based Image Super-Resolution in a Single Step
Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C. Kot, Bihan Wen SketchINR: A First Look into Sketches as Implicit Neural Representations
Hmrishav Bandyopadhyay, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Aneeshan Sain, Tao Xiang, Timothy Hospedales, Yi-Zhe Song SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery
Xin Guo, Jiangwei Lao, Bo Dang, Yingying Zhang, Lei Yu, Lixiang Ru, Liheng Zhong, Ziyuan Huang, Kang Wu, Dingxiang Hu, Huimei He, Jian Wang, Jingdong Chen, Ming Yang, Yongjun Zhang, Yansheng Li SLICE: Stabilized LIME for Consistent Explanations for Image Classification
Revoti Prasad Bora, Philipp Terhörst, Raymond Veldhuis, Raghavendra Ramachandra, Kiran Raja Small Scale Data-Free Knowledge Distillation
He Liu, Yikai Wang, Huaping Liu, Fuchun Sun, Anbang Yao Small Steps and Level Sets: Fitting Neural Surface Models with Point Guidance
Chamin Hewa Koneputugodage, Yizhak Ben-Shabat, Dylan Campbell, Stephen Gould SmartEdit: Exploring Complex Instruction-Based Image Editing with Multimodal Large Language Models
Yuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, Ying Shan Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Jiayi Guo, Xingqian Xu, Yifan Pu, Zanlin Ni, Chaofei Wang, Manushree Vasu, Shiji Song, Gao Huang, Humphrey Shi Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Ekaterina Deyneka, Tsai-Shien Chen, Anil Kag, Yuwei Fang, Aleksei Stoliar, Elisa Ricci, Jian Ren, Sergey Tulyakov SNI-SLAM: Semantic Neural Implicit SLAM
Siting Zhu, Guangming Wang, Hermann Blum, Jiuming Liu, Liang Song, Marc Pollefeys, Hesheng Wang SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration Using Neural Radiance Fields
Quentin Herau, Nathan Piasco, Moussab Bennehar, Luis Roldao, Dzmitry Tsishkou, Cyrille Migniot, Pascal Vasseur, Cédric Demonceaux SODA: Bottleneck Diffusion Models for Representation Learning
Drew A. Hudson, Daniel Zoran, Mateusz Malinowski, Andrew K. Lampinen, Andrew Jaegle, James L. McClelland, Loic Matthey, Felix Hill, Alexander Lerchner SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Andong Wang, Bo Wu, Sunli Chen, Zhenfang Chen, Haotian Guan, Wei-Ning Lee, Li Erran Li, Chuang Gan Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers
Jinyang Liu, Wondmgezahu Teshome, Sandesh Ghimire, Mario Sznaier, Octavia Camps Solving the Catastrophic Forgetting Problem in Generalized Category Discovery
Xinzi Cao, Xiawu Zheng, Guanhong Wang, Weijiang Yu, Yunhang Shen, Ke Li, Yutong Lu, Yonghong Tian SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Changan Chen, Kumar Ashutosh, Rohit Girdhar, David Harwath, Kristen Grauman SPAD: Spatially Aware Multi-View Diffusers
Yash Kant, Aliaksandr Siarohin, Ziyi Wu, Michael Vasilkovsky, Guocheng Qian, Jian Ren, Riza Alp Guler, Bernard Ghanem, Sergey Tulyakov, Igor Gilitschenski SpatialTracker: Tracking Any 2D Pixels in 3D Space
Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Boyuan Chen, Zhuo Xu, Sean Kirmani, Brain Ichter, Dorsa Sadigh, Leonidas Guibas, Fei Xia Spatio-Temporal Turbulence Mitigation: A Translational Perspective
Xingguang Zhang, Nicholas Chimitt, Yiheng Chi, Zhiyuan Mao, Stanley H. Chan SpecNeRF: Gaussian Directional Encoding for Specular Reflections
Li Ma, Vasu Agrawal, Haithem Turki, Changil Kim, Chen Gao, Pedro Sander, Michael Zollhöfer, Christian Richardt Spectral and Polarization Vision: Spectro-Polarimetric Real-World Dataset
Yujin Jeon, Eunsue Choi, Youngchan Kim, Yunseong Moon, Khalid Omer, Felix Heide, Seung-Hwan Baek Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation
Dongliang Cao, Marvin Eisenberger, Nafie El Amrani, Daniel Cremers, Florian Bernard Spectrum AUC Difference (SAUCD): Human-Aligned 3D Shape Evaluation
Tianyu Luan, Zhong Li, Lele Chen, Xuan Gong, Lichang Chen, Yi Xu, Junsong Yuan Spin-up: Spin Light for Natural Light Uncalibrated Photometric Stereo
Zongrui Li, Zhan Lu, Haojie Yan, Boxin Shi, Gang Pan, Qian Zheng, Xudong Jiang SPIN: Simultaneous Perception Interaction and Navigation
Shagun Uppal, Ananye Agarwal, Haoyu Xiong, Kenneth Shaw, Deepak Pathak SplaTAM: Splat Track & mAP 3D Gaussians for Dense RGB-D SLAM
Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, Jonathon Luiten Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Stanislaw Szymanowicz, Chrisitian Rupprecht, Andrea Vedaldi SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting
Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Kiana Ehsani, Tanmay Gupta, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
Yuxuan Zhang, Yiren Song, Jiaming Liu, Rui Wang, Jinpeng Yu, Hao Tang, Huaxia Li, Xu Tang, Yao Hu, Han Pan, Zhongliang Jing State Space Models for Event Cameras
Nikola Zubic, Mathias Gehrig, Davide Scaramuzza StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation
Sidi Wu, Yizi Chen, Samuel Mermet, Lorenz Hurni, Konrad Schindler, Nicolas Gonthier, Loic Landrieu StraightPCF: Straight Point Cloud Filtering
Dasith de Silva Edirimuni, Xuequan Lu, Gang Li, Lei Wei, Antonio Robles-Kelly, Hongdong Li Stratified Avatar Generation from Sparse Observations
Han Feng, Wenchao Ma, Quankai Gao, Xianwei Zheng, Nan Xue, Huijuan Xu Streaming Dense Video Captioning
Xingyi Zhou, Anurag Arnab, Shyamal Buch, Shen Yan, Austin Myers, Xuehan Xiong, Arsha Nagrani, Cordelia Schmid Stronger Fewer & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
Zhixiang Wei, Lin Chen, Yi Jin, Xiaoxiao Ma, Tianle Liu, Pengyang Ling, Ben Wang, Huaian Chen, Jinjin Zheng Structure-Aware Sparse-View X-Ray 3D Reconstruction
Yuanhao Cai, Jiahao Wang, Alan Yuille, Zongwei Zhou, Angtian Wang Structure-Guided Adversarial Training of Diffusion Models
Ling Yang, Haotian Qian, Zhilong Zhang, Jingwei Liu, Bin Cui Style Aligned Image Generation via Shared Attention
Amir Hertz, Andrey Voynov, Shlomi Fruchter, Daniel Cohen-Or SubT-MRS Dataset: Pushing SLAM Towards All-Weather Environments
Shibo Zhao, Yuanjun Gao, Tianhao Wu, Damanpreet Singh, Rushan Jiang, Haoxiang Sun, Mansi Sarawata, Yuheng Qiu, Warren Whittaker, Ian Higgins, Yi Du, Shaoshu Su, Can Xu, John Keller, Jay Karhade, Lucas Nogueira, Sourojit Saha, Ji Zhang, Wenshan Wang, Chen Wang, Sebastian Scherer SUGAR: Pre-Training 3D Visual Representations for Robotics
Shizhe Chen, Ricardo Garcia, Ivan Laptev, Cordelia Schmid Super-Resolution Reconstruction from Bayer-Pattern Spike Streams
Yanchen Dong, Ruiqin Xiong, Jian Zhang, Zhaofei Yu, Xiaopeng Fan, Shuyuan Zhu, Tiejun Huang SuperSVG: Superpixel-Based Scalable Vector Graphics Synthesis
Teng Hu, Ran Yi, Baihong Qian, Jiangning Zhang, Paul L. Rosin, Yu-Kun Lai Supervised Anomaly Detection for Complex Industrial Images
Aimira Baitieva, David Hurych, Victor Besnier, Olivier Bernard Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
Xun Lin, Shuai Wang, Rizhao Cai, Yizhong Liu, Ying Fu, Wenzhong Tang, Zitong Yu, Alex Kot SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field
Lizhe Liu, Bohua Wang, Hongwei Xie, Daqi Liu, Li Liu, Zhiqiang Tian, Kuiyuan Yang, Bing Wang SVGDreamer: Text Guided SVG Generation with Diffusion Model
Ximing Xing, Haitao Zhou, Chuang Wang, Jing Zhang, Dong Xu, Qian Yu Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
Haoyi Jiang, Tianheng Cheng, Naiyu Gao, Haoyang Zhang, Tianwei Lin, Wenyu Liu, Xinggang Wang SyncTalk: The Devil Is in the Synchronization for Talking Head Synthesis
Ziqiao Peng, Wentao Hu, Yue Shi, Xiangyu Zhu, Xiaomei Zhang, Hao Zhao, Jun He, Hongyan Liu, Zhaoxin Fan Synergistic Global-Space Camera and Human Reconstruction from Videos
Yizhou Zhao, Tuanfeng Yang Wang, Bhiksha Raj, Min Xu, Jimei Yang, Chun-Hao Paul Huang SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement
Tao Wang, Lei Jin, Zheng Wang, Jianshu Li, Liang Li, Fang Zhao, Yu Cheng, Li Yuan, Li Zhou, Junliang Xing, Jian Zhao TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding
Yun Liu, Haolin Yang, Xu Si, Ling Liu, Zipeng Li, Yuxiang Zhang, Yebin Liu, Li Yi Tactile-Augmented Radiance Fields
Yiming Dou, Fengyu Yang, Yi Liu, Antonio Loquercio, Andrew Owens Taming Mode Collapse in Score Distillation for Text-to-3D Generation
Peihao Wang, Dejia Xu, Zhiwen Fan, Dilin Wang, Sreyas Mohan, Forrest Iandola, Rakesh Ranjan, Yilei Li, Qiang Liu, Zhangyang Wang, Vikas Chandra Taming Self-Training for Open-Vocabulary Object Detection
Shiyu Zhao, Samuel Schulter, Long Zhao, Zhixing Zhang, B G Vijay Kumar, Yumin Suh, Manmohan Chandraker, Dimitris N. Metaxas Taming Stable Diffusion for Text to 360 Panorama Image Generation
Cheng Zhang, Qianyi Wu, Camilo Cruz Gambardella, Xiaoshui Huang, Dinh Phung, Wanli Ouyang, Jianfei Cai Targeted Representation Alignment for Open-World Semi-Supervised Learning
Ruixuan Xiao, Lei Feng, Kai Tang, Junbo Zhao, Yixuan Li, Gang Chen, Haobo Wang TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
Xiaopei Wu, Yuenan Hou, Xiaoshui Huang, Binbin Lin, Tong He, Xinge Zhu, Yuexin Ma, Boxi Wu, Haifeng Liu, Deng Cai, Wanli Ouyang Task-Aware Encoder Control for Deep Video Compression
Xingtong Ge, Jixiang Luo, Xinjie Zhang, Tongda Xu, Guo Lu, Dailan He, Jing Geng, Yan Wang, Jun Zhang, Hongwei Qin TEA: Test-Time Energy Adaptation
Yige Yuan, Bingbing Xu, Liang Hou, Fei Sun, Huawei Shen, Xueqi Cheng Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, Ming-Hsuan Yang TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes
Xuying Zhang, Bo-Wen Yin, Yuming Chen, Zheng Lin, Yunheng Li, Qibin Hou, Ming-Ming Cheng Test-Time Adaptation for Depth Completion
Hyoungseob Park, Anjali Gupta, Alex Wong Test-Time Domain Generalization for Face Anti-Spoofing
Qianyu Zhou, Ke-Yue Zhang, Taiping Yao, Xuequan Lu, Shouhong Ding, Lizhuang Ma Test-Time Linear Out-of-Distribution Detection
Ke Fan, Tong Liu, Xingyu Qiu, Yikai Wang, Lian Huai, Zeyu Shangguan, Shuang Gou, Fengjian Liu, Yuqian Fu, Yanwei Fu, Xingqun Jiang Test-Time Zero-Shot Temporal Action Localization
Benedetta Liberatori, Alessandro Conti, Paolo Rota, Yiming Wang, Elisa Ricci TexOct: Generating Textures of 3D Models with Octree-Based Diffusion
Jialun Liu, Chenming Wu, Xinqi Liu, Xing Liu, Jinbo Wu, Haotian Peng, Chen Zhao, Haocheng Feng, Jingtuo Liu, Errui Ding Text Grouping Adapter: Adapting Pre-Trained Text Detector for Layout Analysis
Tianci Bi, Xiaoyi Zhang, Zhizheng Zhang, Wenxuan Xie, Cuiling Lan, Yan Lu, Nanning Zheng Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Jiamian Wang, Guohao Sun, Pichao Wang, Dongfang Liu, Sohail Dianat, Majid Rabbani, Raghuveer Rao, Zhiqiang Tao Text-Conditioned Generative Model of 3D Strand-Based Human Hairstyles
Vanessa Sklyarova, Egor Zakharov, Otmar Hilliges, Michael J. Black, Justus Thies Text-Driven Image Editing via Learnable Regions
Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Lu Jiang, Ming-Hsuan Yang Text-Guided 3D Face Synthesis - From Generation to Editing
Yunjie Wu, Yapeng Meng, Zhipeng Hu, Lincheng Li, Haoqian Wu, Kun Zhou, Weiwei Xu, Xin Yu Text-Guided Explorable Image Super-Resolution
Kanchana Vaishnavi Gandikota, Paramanand Chandramouli Text-Image Alignment for Diffusion-Based Perception
Neehar Kondapaneni, Markus Marks, Manuel Knott, Rogerio Guimaraes, Pietro Perona Text-to-3D Generation with Bidirectional Diffusion Using Both 2D and 3D Priors
Lihe Ding, Shaocong Dong, Zhanpeng Huang, Zibin Wang, Yiyuan Zhang, Kaixiong Gong, Dan Xu, Tianfan Xue Text-to-3D Using Gaussian Splatting
Zilong Chen, Feng Wang, Yikai Wang, Huaping Liu Text-to-Image Diffusion Models Are Great Sketch-Photo Matchmakers
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song Text2Loc: 3D Point Cloud Localization from Natural Language
Yan Xia, Letian Shi, Zifeng Ding, Joao F. Henriques, Daniel Cremers TextCraftor: Your Text Encoder Can Be Image Quality Controller
Yanyu Li, Xian Liu, Anil Kag, Ju Hu, Yerlan Idelbayev, Dhritiman Sagar, Yanzhi Wang, Sergey Tulyakov, Jian Ren TexTile: A Differentiable Metric for Texture Tileability
Carlos Rodriguez-Pardo, Dan Casas, Elena Garces, Jorge Lopez-Moreno Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-on
Xu Yang, Changxing Ding, Zhibin Hong, Junhao Huang, Jin Tao, Xiangmin Xu TextureDreamer: Image-Guided Texture Synthesis Through Geometry-Aware Diffusion
Yu-Ying Yeh, Jia-Bin Huang, Changil Kim, Lei Xiao, Thu Nguyen-Phuoc, Numair Khan, Cheng Zhang, Manmohan Chandraker, Carl S Marshall, Zhao Dong, Zhengqin Li The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective
Wenqi Jia, Miao Liu, Hao Jiang, Ishwarya Ananthabhotla, James M. Rehg, Vamsi Krishna Ithapu, Ruohan Gao The More You See in 2D the More You Perceive in 3D
Xinyang Han, Zelin Gao, Angjoo Kanazawa, Shubham Goel, Yossi Gandelsman The Neglected Tails in Vision-Language Models
Shubham Parashar, Zhiqiu Lin, Tian Liu, Xiangjue Dong, Yanan Li, Deva Ramanan, James Caverlee, Shu Kong Three Pillars Improving Vision Foundation Model Distillation for LiDAR
Gilles Puy, Spyros Gidaris, Alexandre Boulch, Oriane Siméoni, Corentin Sautier, Patrick Pérez, Andrei Bursuc, Renaud Marlet TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
Haomiao Ni, Bernhard Egger, Suhas Lohit, Anoop Cherian, Ye Wang, Toshiaki Koike-Akino, Sharon X. Huang, Tim K. Marks TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk, Jaesung Huh, Evangelos Kazakos, Andrew Zisserman, Dima Damen Time- Memory- and Parameter-Efficient Visual Adaptation
Otniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid, Anurag Arnab Time-Efficient Light-Field Acquisition Using Coded Aperture and Events
Shuji Habuchi, Keita Takahashi, Chihiro Tsutake, Toshiaki Fujii, Hajime Nagahara TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing
Sherry X Chen, Yaron Vaxman, Elad Ben Baruch, David Asulin, Aviad Moreshet, Kuo-Chin Lien, Misha Sra, Pradeep Sen ToonerGAN: Reinforcing GANs for Obfuscating Automated Facial Indexing
Kartik Thakral, Shashikant Prasad, Stuti Aswani, Mayank Vatsa, Richa Singh Total Selfie: Generating Full-Body Selfies
Bowei Chen, Brian Curless, Ira Kemelmacher-Shlizerman, Steven M. Seitz Towards 3D Vision with Low-Cost Single-Photon Cameras
Fangzhou Mu, Carter Sifferman, Sacha Jungerman, Yiquan Li, Mark Han, Michael Gleicher, Mohit Gupta, Yin Li Towards a Perceptual Evaluation Framework for Lighting Estimation
Justine Giroux, Mohammad Reza Karimi Dastjerdi, Yannick Hold-Geoffroy, Javier Vazquez-Corral, Jean-François Lalonde Towards Accurate Post-Training Quantization for Diffusion Models
Changyuan Wang, Ziwei Wang, Xiuwei Xu, Yansong Tang, Jie Zhou, Jiwen Lu Towards Automated Movie Trailer Generation
Dawit Mureja Argaw, Mattia Soldan, Alejandro Pardo, Chen Zhao, Fabian Caba Heilbron, Joon Son Chung, Bernard Ghanem Towards Automatic Power Battery Detection: New Challenge Benchmark Dataset and Baseline
Xiaoqi Zhao, Youwei Pang, Zhenyu Chen, Qian Yu, Lihe Zhang, Hanqi Liu, Jiaming Zuo, Huchuan Lu Towards Backward-Compatible Continual Learning of Image Compression
Zhihao Duan, Ming Lu, Justin Yang, Jiangpeng He, Zhan Ma, Fengqing Zhu Towards Better Vision-Inspired Vision-Language Models
Yun-Hao Cao, Kaixiang Ji, Ziyuan Huang, Chuanyang Zheng, Jiajia Liu, Jian Wang, Jingdong Chen, Ming Yang Towards Co-Evaluation of Cameras HDR and Algorithms for Industrial-Grade 6DoF Pose Estimation
Agastya Kalra, Guy Stoppi, Dmitrii Marin, Vage Taamazyan, Aarrushi Shandilya, Rishav Agarwal, Anton Boykov, Tze Hao Chong, Michael Stark Towards Efficient Replay in Federated Incremental Learning
Yichen Li, Qunwei Li, Haozhao Wang, Ruixuan Li, Wenliang Zhong, Guannan Zhang Towards Fairness-Aware Adversarial Learning
Yanghao Zhang, Tianle Zhang, Ronghui Mu, Xiaowei Huang, Wenjie Ruan Towards Generalizable Multi-Object Tracking
Zheng Qin, Le Wang, Sanping Zhou, Panpan Fu, Gang Hua, Wei Tang Towards Generalizable Tumor Synthesis
Qi Chen, Xiaoxi Chen, Haorui Song, Zhiwei Xiong, Alan Yuille, Chen Wei, Zongwei Zhou Towards Generalizing to Unseen Domains with Few Labels
Chamuditha Jayanga Galappaththige, Sanoojan Baliah, Malitha Gunawardhana, Muhammad Haris Khan Towards HDR and HFR Video from Rolling-Mixed-Bit Spikings
Yakun Chang, Yeliduosi Xiaokaiti, Yujia Liu, Bin Fan, Zhaojun Huang, Tiejun Huang, Boxin Shi Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change Loy Towards Large-Scale 3D Representation Learning with Multi-Dataset Point Prompt Training
Xiaoyang Wu, Zhuotao Tian, Xin Wen, Bohao Peng, Xihui Liu, Kaicheng Yu, Hengshuang Zhao Towards More Accurate Diffusion Model Acceleration with a Timestep Tuner
Mengfei Xia, Yujun Shen, Changsong Lei, Yu Zhou, Deli Zhao, Ran Yi, Wenping Wang, Yong-Jin Liu Towards More Unified In-Context Visual Understanding
Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Tao Gong, Bin Liu, Shengwei Xu, Nenghai Yu Towards Progressive Multi-Frequency Representation for Image Warping
Jun Xiao, Zihang Lyu, Cong Zhang, Yakun Ju, Changjian Shui, Kin-Man Lam Towards Text-Guided 3D Scene Composition
Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, Hsin-Ying Lee Traffic Scene Parsing Through the TSP6K Dataset
Peng-Tao Jiang, Yuqi Yang, Yang Cao, Qibin Hou, Ming-Ming Cheng, Chunhua Shen Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning
Zichen Miao, Jiang Wang, Ze Wang, Zhengyuan Yang, Lijuan Wang, Qiang Qiu, Zicheng Liu Training-Free Pretrained Model Merging
Zhengqi Xu, Ke Yuan, Huiqiong Wang, Yong Wang, Mingli Song, Jie Song Transcriptomics-Guided Slide Representation Learning in Computational Pathology
Guillaume Jaume, Lukas Oldenburg, Anurag Vaidya, Richard J. Chen, Drew F.K. Williamson, Thomas Peeters, Andrew H. Song, Faisal Mahmood Transductive Zero-Shot and Few-Shot CLIP
Ségolène Martin, Yunshi Huang, Fereshteh Shakeri, Jean-Christophe Pesquet, Ismail Ben Ayed TransLoc4D: Transformer-Based 4D Radar Place Recognition
Guohao Peng, Heshan Li, Yangyang Zhao, Jun Zhang, Zhenyu Wu, Pengyu Zheng, Danwei Wang Tri-Perspective View Decomposition for Geometry-Aware Depth Completion
Zhiqiang Yan, Yuankai Lin, Kun Wang, Yupeng Zheng, Yufei Wang, Zhenyu Zhang, Jun Li, Jian Yang TRINS: Towards Multimodal Language Models That Can Read
Ruiyi Zhang, Yanzhe Zhang, Jian Chen, Yufan Zhou, Jiuxiang Gu, Changyou Chen, Tong Sun TULIP: Transformer for Upsampling of LiDAR Point Clouds
Bin Yang, Patrick Pfreundschuh, Roland Siegwart, Marco Hutter, Peyman Moghadam, Vaishakh Patil TUMTraf V2X Cooperative Perception Dataset
Walter Zimmer, Gerhard Arya Wardana, Suren Sritharan, Xingcheng Zhou, Rui Song, Alois C. Knoll Tune-an-Ellipse: CLIP Has Potential to Find What You Want
Jinheng Xie, Songhe Deng, Bing Li, Haozhe Liu, Yawen Huang, Yefeng Zheng, Jurgen Schmidhuber, Bernard Ghanem, Linlin Shen, Mike Zheng Shou TurboSL: Dense Accurate and Fast 3D by Neural Inverse Structured Light
Parsa Mirdehghan, Maxx Wu, Wenzheng Chen, David B. Lindell, Kiriakos N. Kutulakos Tyche: Stochastic In-Context Learning for Medical Image Segmentation
Marianne Rakic, Hallee E. Wong, Jose Javier Gonzalez Ortiz, Beth A. Cimini, John V. Guttag, Adrian V. Dalca UFineBench: Towards Text-Based Person Retrieval with Ultra-Fine Granularity
Jialong Zuo, Hanyu Zhou, Ying Nie, Feng Zhang, Tianyu Guo, Nong Sang, Yunhe Wang, Changxin Gao ULIP-2: Towards Scalable Multimodal Pre-Training for 3D Understanding
Le Xue, Ning Yu, Shu Zhang, Artemis Panagopoulou, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, Silvio Savarese Unbiased Estimator for Distorted Conics in Camera Calibration
Chaehyeon Song, Jaeho Shin, Myung-Hwan Jeon, Jongwoo Lim, Ayoung Kim Unbiased Faster R-CNN for Single-Source Domain Generalized Object Detection
Yajing Liu, Shijun Zhou, Xiyao Liu, Chunhui Hao, Baojie Fan, Jiandong Tian Uncertainty-Guided Never-Ending Learning to Drive
Lei Lai, Eshed Ohn-Bar, Sanjay Arora, John Seon Keun Yi Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Hang Du, Sicheng Zhang, Binzhu Xie, Guoshun Nan, Jiayang Zhang, Junrui Xu, Hangyu Liu, Sicong Leng, Jiangming Liu, Hehe Fan, Dajiu Huang, Jing Feng, Linli Chen, Can Zhang, Xuhuan Li, Hao Zhang, Jianhang Chen, Qimei Cui, Xiaofeng Tao Understanding Video Transformers via Universal Concept Discovery
Matthew Kowal, Achal Dave, Rares Ambrus, Adrien Gaidon, Konstantinos G. Derpanis, Pavel Tokmakov Ungeneralizable Examples
Jingwen Ye, Xinchao Wang UniDepth: Universal Monocular Metric Depth Estimation
Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, Fisher Yu Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi Unifying Automatic and Interactive Matting with Pretrained ViTs
Zixuan Ye, Wenze Liu, He Guo, Yujia Liang, Chaoyi Hong, Hao Lu, Zhiguo Cao Unifying Correspondence Pose and NeRF for Generalized Pose-Free Novel View Synthesis
Sunghwan Hong, Jaewoo Jung, Heeseong Shin, Jiaolong Yang, Seungryong Kim, Chong Luo Unifying Top-Down and Bottom-up Scanpath Prediction Using Transformers
Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Ruoyu Xue, Gregory Zelinsky, Minh Hoai, Dimitris Samaras UniGS: Unified Representation for Image Generation and Segmentation
Lu Qi, Lehan Yang, Weidong Guo, Yu Xu, Bo Du, Varun Jampani, Ming-Hsuan Yang UniHuman: A Unified Model for Editing Human Images in the Wild
Nannan Li, Qing Liu, Krishna Kumar Singh, Yilin Wang, Jianming Zhang, Bryan A. Plummer, Zhe Lin UniMODE: Unified Monocular 3D Object Detection
Zhuoling Li, Xiaogang Xu, SerNam Lim, Hengshuang Zhao UniPAD: A Universal Pre-Training Paradigm for Autonomous Driving
Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang UniPTS: A Unified Framework for Proficient Post-Training Sparsity
Jingjing Xie, Yuxin Zhang, Mingbao Lin, Liujuan Cao, Rongrong Ji Universal Novelty Detection Through Adaptive Contrastive Learning
Hossein Mirzaei, Mojtaba Nafez, Mohammad Jafari, Mohammad Bagher Soltani, Mohammad Azizmalayeri, Jafar Habibi, Mohammad Sabokrou, Mohammad Hossein Rohban Universal Segmentation at Arbitrary Granularity with Language Instruction
Yong Liu, Cairong Zhang, Yitong Wang, Jiahao Wang, Yujiu Yang, Yansong Tang Universal Semi-Supervised Domain Adaptation by Mitigating Common-Class Bias
Wenyu Zhang, Qingmu Liu, Felix Ong Wei Cong, Mohamed Ragab, Chuan-Sheng Foo Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding
Zhiheng Cheng, Qingyue Wei, Hongru Zhu, Yan Wang, Liangqiong Qu, Wei Shao, Yuyin Zhou Unlocking Pre-Trained Image Backbones for Semantic Image Synthesis
Tariq Berrada Ifriqi, Jakob Verbeek, Camille Couprie, Karteek Alahari Unmixing Diffusion for Self-Supervised Hyperspectral Image Denoising
Haijin Zeng, Jiezhang Cao, Kai Zhang, Yongyong Chen, Hiep Luong, Wilfried Philips UnO: Unsupervised Occupancy Fields for Perception and Forecasting
Ben Agro, Quinlan Sykora, Sergio Casas, Thomas Gilles, Raquel Urtasun Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen, Yuyuan Liu, Hu Wang, Fengbei Liu, Chong Wang, Helen Frazer, Gustavo Carneiro UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model
Shuai Yuan, Lei Luo, Zhuo Hui, Can Pu, Xiaoyu Xiang, Rakesh Ranjan, Denis Demandolx Unsupervised Blind Image Deblurring Based on Self-Enhancement
Lufei Chen, Xiangpeng Tian, Shuhua Xiong, Yinjie Lei, Chao Ren Unsupervised Keypoints from Pretrained Diffusion Models
Eric Hedlin, Gopal Sharma, Shweta Mahajan, Xingzhe He, Hossam Isack, Abhishek Kar, Helge Rhodin, Andrea Tagliasacchi, Kwang Moo Yi Unsupervised Universal Image Segmentation
Dantong Niu, Xudong Wang, Xinyang Han, Long Lian, Roei Herzig, Trevor Darrell URHand: Universal Relightable Hands
Zhaoxi Chen, Gyeongsik Moon, Kaiwen Guo, Chen Cao, Stanislav Pidhorskyi, Tomas Simon, Rohan Joshi, Yuan Dong, Yichen Xu, Bernardo Pires, He Wen, Lucas Evans, Bo Peng, Julia Buffalini, Autumn Trimble, Kevyn McPhail, Melissa Schoeller, Shoou-I Yu, Javier Romero, Michael Zollhofer, Yaser Sheikh, Ziwei Liu, Shunsuke Saito USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
Xiaoqi Wang, Wenbin He, Xiwei Xuan, Clint Sebastian, Jorge Piazentin Ono, Xin Li, Sima Behpour, Thang Doan, Liang Gou, Han-Wei Shen, Liu Ren Using Human Feedback to Fine-Tune Diffusion Models Without Any Reward Model
Kai Yang, Jian Tao, Jiafei Lyu, Chunjiang Ge, Jiaxin Chen, Weihan Shen, Xiaolong Zhu, Xiu Li Utility-Fairness Trade-Offs and How to Find Them
Sepehr Dehdashtian, Bashir Sadeghi, Vishnu Naresh Boddeti UV-IDM: Identity-Conditioned Latent Diffusion Model for Face UV-Texture Generation
Hong Li, Yutang Feng, Song Xue, Xuhui Liu, Bohan Zeng, Shanglin Li, Boyu Liu, Jianzhuang Liu, Shumin Han, Baochang Zhang Validating Privacy-Preserving Face Recognition Under a Minimum Assumption
Hui Zhang, Xingbo Dong, YenLung Lai, Ying Zhou, Xiaoyan Zhang, Xingguo Lv, Zhe Jin, Xuejun Li Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes
Diandian Guo, Deng-Ping Fan, Tongyu Lu, Christos Sakaridis, Luc Van Gool VAREN: Very Accurate and Realistic Equine Network
Silvia Zuffi, Ylva Mellbin, Ci Li, Markus Hoeschle, Hedvig Kjellström, Senya Polikovsky, Elin Hernlund, Michael J. Black VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction
Jiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, Youliang Yan, Wenming Yang VBench: Comprehensive Benchmark Suite for Video Generative Models
Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu VecFusion: Vector Font Generation with Diffusion
Vikas Thamizharasan, Difan Liu, Shantanu Agarwal, Matthew Fisher, Michael Gharbi, Oliver Wang, Alec Jacobson, Evangelos Kalogerakis VGGSfM: Visual Geometry Grounded Deep Structure from Motion
Jianyuan Wang, Nikita Karaev, Christian Rupprecht, David Novotny Vid-TLDR: Training Free Token Merging for Light-Weight Video Transformer
Joonmyung Choi, Sanghyeok Lee, Jaewon Chu, Minhyuk Choi, Hyunwoo J. Kim Video Interpolation with Diffusion Models
Siddhant Jain, Daniel Watson, Eric Tabellion, Aleksander Ho?ynski, Ben Poole, Janne Kontkanen Video ReCap: Recursive Captioning of Hour-Long Videos
Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, Lorenzo Torresani, Gedas Bertasius Video Recognition in Portrait Mode
Mingfei Han, Linjie Yang, Xiaojie Jin, Jiashi Feng, Xiaojun Chang, Heng Wang Video-P2P: Video Editing with Cross-Attention Control
Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia VideoBooth: Diffusion-Based Video Generation with Image Prompts
Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal, Yonatan Bitton, Idan Szpektor, Kai-Wei Chang, Aditya Grover VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao Weng, Ying Shan VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan VideoLLM-Online: Online Video Large Language Model for Streaming Video
Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Zheng Shou VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei, Tao Chen, Xiruo Jiang, Huafeng Liu, Zeren Sun, Yazhou Yao VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams
Liao Wang, Kaixin Yao, Chengcheng Guo, Zhirui Zhang, Qiang Hu, Jingyi Yu, Lan Xu, Minye Wu VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Yuchao Gu, Yipin Zhou, Bichen Wu, Licheng Yu, Jia-Wei Liu, Rui Zhao, Jay Zhangjie Wu, David Junhao Zhang, Mike Zheng Shou, Kevin Tang VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan, Son Tran, Benjamin Z. Yao, Belinda Zeng, Mubarak Shah, Trishul Chilimbi View from Above: Orthogonal-View Aware Cross-View Localization
Shan Wang, Chuong Nguyen, Jiawei Liu, Yanhao Zhang, Sundaram Muthu, Fahira Afzal Maken, Kaihao Zhang, Hongdong Li ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
Lukas Höllein, Aljaž Boži?, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollhöfer, Matthias Nießner ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
Xianghui Yang, Yan Zuo, Sameera Ramasinghe, Loris Bazzani, Gil Avraham, Anton van den Hengel VILA: On Pre-Training for Visual Language Models
Ji Lin, Hongxu Yin, Wei Ping, Pavlo Molchanov, Mohammad Shoeybi, Song Han VINECS: Video-Based Neural Character Skinning
Zhouyingcheng Liao, Vladislav Golyanik, Marc Habermann, Christian Theobalt ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee Vision-and-Language Navigation via Causal Learning
Liuyi Wang, Zongtao He, Ronghao Dang, Mengjiao Shen, Chengju Liu, Qijun Chen Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Yunhao Ge, Xiaohui Zeng, Jacob Samuel Huffman, Tsung-Yi Lin, Ming-Yu Liu, Yin Cui Visual In-Context Prompting
Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation
Mohammad Amin Shabani, Zhaowen Wang, Difan Liu, Nanxuan Zhao, Jimei Yang, Yasutaka Furukawa Visual Objectification in Films: Towards a New AI Task for Video Interpretation
Julie Tores, Lucile Sassatelli, Hui-Yin Wu, Clement Bergman, Léa Andolfi, Victor Ecrement, Frédéric Precioso, Thierry Devars, Magali Guaresi, Virginie Julliard, Sarah Lecossais Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu, Otilia Stretcu, Chun-Ta Lu, Krishnamurthy Viswanathan, Kenji Hata, Enming Luo, Ranjay Krishna, Ariel Fuxman Visual Programming for Zero-Shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan, Jinke Ren, Chun-Mei Feng, Hengshuang Zhao, Shuguang Cui, Zhen Li Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
Wenjin Hou, Shiming Chen, Shuhuang Chen, Ziming Hong, Yan Wang, Xuetao Feng, Salman Khan, Fahad Shahbaz Khan, Xinge You ViT-Lens: Towards Omni-Modal Representations
Weixian Lei, Yixiao Ge, Kun Yi, Jianfeng Zhang, Difei Gao, Dylan Sun, Yuying Ge, Ying Shan, Mike Zheng Shou ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jieneng Chen, Qihang Yu, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
Jeong-gi Kwak, Erqun Dong, Yuhe Jin, Hanseok Ko, Shweta Mahajan, Kwang Moo Yi Vlogger: Make Your Dream a Vlog
Shaobin Zhuang, Kunchang Li, Xinyuan Chen, Yaohui Wang, Ziwei Liu, Yu Qiao, Yali Wang VLP: Vision Language Planning for Autonomous Driving
Chenbin Pan, Burhaneddin Yaman, Tommaso Nesti, Abhirup Mallik, Alessandro G Allievi, Senem Velipasalar, Liu Ren VRP-SAM: SAM with Visual Reference Prompt
Yanpeng Sun, Jiahui Chen, Shan Zhang, Xinyu Zhang, Qiang Chen, Gang Zhang, Errui Ding, Jingdong Wang, Zechao Li VS: Reconstructing Clothed 3D Human from Single Image via Vertex Shift
Leyuan Liu, Yuhan Li, Yunqi Gao, Changxin Gao, Yuanyuan Liu, Jingying Chen VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
Ziyang Luo, Nian Liu, Wangbo Zhao, Xuguang Yang, Dingwen Zhang, Deng-Ping Fan, Fahad Khan, Junwei Han VTimeLLM: Empower LLM to Grasp Video Moments
Bin Huang, Xin Wang, Hong Chen, Zihan Song, Wenwu Zhu WANDR: Intention-Guided Human Motion Generation
Markos Diomataris, Nikos Athanasiou, Omid Taheri, Xi Wang, Otmar Hilliges, Michael J. Black WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
Youngdong Jang, Dong In Lee, MinHyuk Jang, Jong Wook Kim, Feng Yang, Sangpil Kim WaveMo: Learning Wavefront Modulations to See Through Scattering
Mingyang Xie, Haiyun Guo, Brandon Y. Feng, Lingbo Jin, Ashok Veeraraghavan, Christopher A. Metzler Weak-to-Strong 3D Object Detection with X-Ray Distillation
Alexander Gambashidze, Aleksandr Dadukin, Maxim Golyadkin, Maria Razzhivina, Ilya Makarov Weakly Supervised Video Individual Counting
Xinyan Liu, Guorong Li, Yuankai Qi, Ziheng Yan, Zhenjun Han, Anton van den Hengel, Ming-Hsuan Yang, Qingming Huang Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-Speech Gesture Generation
Xingqun Qi, Jiahao Pan, Peng Li, Ruibin Yuan, Xiaowei Chi, Mengfei Li, Wenhan Luo, Wei Xue, Shanghang Zhang, Qifeng Liu, Yike Guo What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation
Yihua Cheng, Yaning Zhu, Zongji Wang, Hongquan Hao, Yongwei Liu, Shiqing Cheng, Xi Wang, Hyung Jin Chang What Sketch Explainability Really Means for Downstream Tasks?
Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Aneeshan Sain, Tao Xiang, Yi-Zhe Song What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogerio Feris, James Glass, Hilde Kuehne What You See Is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs
Alex Trevithick, Matthew Chan, Towaki Takikawa, Umar Iqbal, Shalini De Mello, Manmohan Chandraker, Ravi Ramamoorthi, Koki Nagano Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
Kumaranage Ravindu Yasas Nagasinghe, Honglu Zhou, Malitha Gunawardhana, Martin Renqiang Min, Daniel Harari, Muhammad Haris Khan WildlifeMapper: Aerial Image Analysis for Multi-Species Detection and Identification
Satish Kumar, Bowen Zhang, Chandrakanth Gudavalli, Connor Levenson, Lacey Hughey, Jared A. Stabach, Irene Amoke, Gordon Ojwang, Joseph Mukeka, Stephen Mwiu, Joseph Ogutu, Howard Frederick, B.S. Manjunath Wired Perspectives: Multi-View Wire Art Embraces Generative AI
Zhiyu Qu, Lan Yang, Honggang Zhang, Tao Xiang, Kaiyue Pang, Yi-Zhe Song Wonder3D: Single Image to 3D Using Cross-Domain Diffusion
Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, Wenping Wang WonderJourney: Going from Anywhere to Everywhere
Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann WorDepth: Variational Language Prior for Monocular Depth Estimation
Ziyao Zeng, Daniel Wang, Fengyu Yang, Hyoungseob Park, Stefano Soatto, Dong Lao, Alex Wong Would Deep Generative Models Amplify Bias in Future Models?
Tianwei Chen, Yusuke Hirota, Mayu Otani, Noa Garcia, Yuta Nakashima X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Lingmin Ran, Xiaodong Cun, Jia-Wei Liu, Rui Zhao, Song Zijie, Xintao Wang, Jussi Keppo, Mike Zheng Shou X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva, Fadime Sener, Edoardo Remelli, Bugra Tekin, Eric Sauser, Bernt Schiele, Shugao Ma XCube: Large-Scale 3D Generative Modeling Using Sparse Voxel Hierarchies
Xuanchi Ren, Jiahui Huang, Xiaohui Zeng, Ken Museth, Sanja Fidler, Francis Williams XFeat: Accelerated Features for Lightweight Image Matching
Guilherme Potje, Felipe Cadar, André Araujo, Renato Martins, Erickson R. Nascimento XFibrosis: Explicit Vessel-Fiber Modeling for Fibrosis Staging from Liver Pathology Images
Chong Yin, Siqi Liu, Fei Lyu, Jiahao Lu, Sune Darkner, Vincent Wai-Sun Wong, Pong C. Yuen YOLO-World: Real-Time Open-Vocabulary Object Detection
Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan YolOOD: Utilizing Object Detection Concepts for Multi-Label Out-of-Distribution Detection
Alon Zolfi, Guy Amit, Amit Baras, Satoru Koda, Ikuya Morikawa, Yuval Elovici, Asaf Shabtai You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
Marianna Ohanyan, Hayk Manukyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi Zero-Shot Structure-Preserving Diffusion Model for High Dynamic Range Tone Mapping
Ruoxi Zhu, Shusong Xu, Peiye Liu, Sicheng Li, Yanheng Lu, Dimin Niu, Zihao Liu, Zihao Meng, Zhiyong Li, Xinhua Chen, Yibo Fan ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image
Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu ZeroShape: Regression-Based Zero-Shot Shape Reconstruction
Zixuan Huang, Stefan Stojanov, Anh Thai, Varun Jampani, James M. Rehg ZONE: Zero-Shot Instruction-Guided Local Editing
Shanglin Li, Bohan Zeng, Yutang Feng, Sicheng Gao, Xiuhui Liu, Jiaming Liu, Lin Li, Xu Tang, Yao Hu, Jianzhuang Liu, Baochang Zhang