ICCV 2025
2701 papers
"Principal Components" Enable a New Language of Images
Xin Wen, Bingchen Zhao, Ismail Elezi, Jiankang Deng, Xiaojuan Qi 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Wenqi Zhang, Hang Zhang, Xin Li, Jiashuo Sun, Yongliang Shen, Weiming Lu, Deli Zhao, Yueting Zhuang, Lidong Bing 3D Gaussian Splatting Driven Multi-View Robust Physical Adversarial Camouflage Generation
Tianrui Lou, Xiaojun Jia, Siyuan Liang, Jiawei Liang, Ming Zhang, Yanjun Xiao, Xiaochun Cao 3D Mesh Editing Using Masked LRMs
Will Gao, Dilin Wang, Yuchen Fan, Aljaz Bozic, Tuur Stuyck, Zhengqin Li, Zhao Dong, Rakesh Ranjan, Nikolaos Sarafianos 3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection
Yung-Hsu Yang, Luigi Piccinelli, Mattia Segu, Siyuan Li, Rui Huang, Yuqian Fu, Marc Pollefeys, Hermann Blum, Zuria Bauer 3DRealCar: An In-the-Wild RGB-D Car Dataset with 360-Degree Views
Xiaobiao Du, Yida Wang, Haiyang Sun, Zhuojie Wu, Hongwei Sheng, Shuyun Wang, Jiaying Ying, Ming Lu, Tianqing Zhu, Kun Zhan, Xin Yu 3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark
Wufei Ma, Haoyu Chen, Guofeng Zhang, Yu-Cheng Chou, Jieneng Chen, Celso de Melo, Alan Yuille 4D Gaussian Splatting SLAM
Yanyan Li, Youxu Fang, Zunjie Zhu, Kunyi Li, Yong Ding, Federico Tombari 4D Visual Pre-Training for Robot Learning
Chengkai Hou, Yanjie Ze, Yankai Fu, Zeyu Gao, Songbo Hu, Yue Yu, Shanghang Zhang, Huazhe Xu 4D-Bench: Benchmarking Multi-Modal Large Language Models for 4D Object Understanding
Wenxuan Zhu, Bing Li, Cheng Zheng, Jinjie Mai, Jun Chen, Letian Jiang, Abdullah Hamdi, Sara Rojas Martinez, Chia-Wen Lin, Mohamed Elhoseiny, Bernard Ghanem 6DOPE-GS: Online 6d Object Pose Estimation Using Gaussian Splatting
Yufeng Jin, Vignesh Prasad, Snehal Jauhri, Mathias Franzius, Georgia Chalvatzaki 7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting
Zhongpai Gao, Benjamin Planche, Meng Zheng, Anwesa Choudhuri, Terrence Chen, Ziyan Wu A Conditional Probability Framework for Compositional Zero-Shot Learning
Peng Wu, Qiuxia Lai, Hao Fang, Guo-Sen Xie, Yilong Yin, Xiankai Lu, Wenguan Wang A Linear N-Point Solver for Structure and Motion from Asynchronous Tracks
Hang Su, Yunlong Feng, Daniel Gehrig, Panfeng Jiang, Ling Gao, Xavier Lagorce, Laurent Kneip A Real-World Display Inverse Rendering Dataset
Seokjun Choi, Hoon-Gyu Chung, Yujin Jeon, Giljoo Nam, Seung-Hwan Baek A Recipe for Generating 3D Worlds from a Single Image
Katja Schwarz, Denis Rozumny, Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder A Simple yet Mighty Hartley Diffusion Versatilist for Generalizable Dense Vision Tasks
Qi Bi, Jingjun Yi, Huimin Huang, Hao Zheng, Haolan Zhan, Wei Ji, Yawen Huang, Yuexiang Li, Yefeng Zheng A Token-Level Text Image Foundation Model for Document Understanding
Tongkun Guan, Zining Wang, Pei Fu, Zhengtao Guo, Wei Shen, Kai Zhou, Tiezhu Yue, Chen Duan, Hao Sun, Qianyi Jiang, Junfeng Luo, Xiaokang Yang A Unified Framework for Industrial Cel-Animation Colorization with Temporal-Structural Awareness
Xiaoyi Feng, Tao Huang, Peng Wang, Zizhou Huang, Zhang Haihang, Yuntao Zou, Dagang Li, Kaifeng Zou A Visual Leap in CLIP Compositionality Reasoning Through Generation of Counterfactual Sets
Zexi Jia, Chuanwei Huang, Hongyan Fei, Yeshuang Zhu, Zhiqiang Yuan, Ying Deng, Jiapei Zhang, Jinchao Zhang, Jie Zhou A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Rongtao Xu, Jian Zhang, Minghao Guo, Youpeng Wen, Haoting Yang, Min Lin, Jianzheng Huang, Zhe Li, Kaidong Zhang, Liqiong Wang, Yuxuan Kuang, Meng Cao, Feng Zheng, Xiaodan Liang A3GS: Arbitrary Artistic Style into Arbitrary 3D Gaussian Splatting
Zhiyuan Fang, Rengan Xie, Xuancheng Jin, Qi Ye, Wei Chen, Wenting Zheng, Rui Wang, Yuchi Huo AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering
Michael Steiner, Thomas Köhler, Lukas Radl, Felix Windisch, Dieter Schmalstieg, Markus Steinberger Accelerating Diffusion Sampling via Exploiting Local Transition Coherence
Shangwen Zhu, Han Zhang, Zhantao Yang, Qianyu Peng, Zhao Pu, Huangji Wang, Fan Cheng Accelerating Diffusion Transformer via Gradient-Optimized Cache
Junxiang Qiu, Lin Liu, Shuo Wang, Jinda Lu, Kezhou Chen, Yanbin Hao AccidentalGS: 3D Gaussian Splatting from Accidental Camera Motion
Mao Mao, Xujie Shen, Guyuan Chen, Boming Zhao, Jiarui Hu, Hujun Bao, Zhaopeng Cui ACE-G: Improving Generalization of Scene Coordinate Regression Through Query Pre-Training
Leonard Bruns, Axel Barroso-Laguna, Tommaso Cavallari, Aron Monszpart, Sowmya Munukutla, Victor Adrian Prisacariu, Eric Brachmann Acknowledging Focus Ambiguity in Visual Questions
Chongyan Chen, Yu-Yun Tseng, Zhuoheng Li, Anush Venkatesh, Danna Gurari Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning.
Daniel DeAlcala, Aythami Morales, Julian Fierrez, Gonzalo Mancera, Ruben Tolosana, Javier Ortega-Garcia AcZeroTS: Active Learning for Zero-Shot Tissue Segmentation in Pathology Images
Jiao Tang, Junjie Zhou, Bo Qian, Peng Wan, Yingli Zuo, Wei Shao, Daoqiang Zhang AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving
Ruifei Zhang, Junlin Xie, Wei Zhang, Weikai Chen, Xiao Tan, Xiang Wan, Guanbin Li Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision
Xiao Fang, Minhyek Jeon, Zheyang Qin, Stanislav Panev, Celso De Melo, Shuowen Hu, Shayok Chakraborty, Fernando De La Torre Adaptive Caching for Faster Video Generation with Diffusion Transformers
Kumara Kahatapitiya, Haozhe Liu, Sen He, Ding Liu, Menglin Jia, Chenyang Zhang, Michael S. Ryoo, Tian Xie AdsQA: Towards Advertisement Video Understanding
Xinwei Long, Kai Tian, Peng Xu, Guoli Jia, Jingxuan Li, Sa Yang, Yihua Shao, Kaiyan Zhang, Che Jiang, Hao Xu, Yang Liu, Jiaheng Ma, Bowen Zhou Advancing Textual Prompt Learning with Anchored Attributes
Zheng Li, Yibing Song, Ming-Ming Cheng, Xiang Li, Jian Yang AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan, Hanqing Liu, Yao Huang, Xiaoqi Wang, Caixin Kang, Hang Su, Yinpeng Dong, Xingxing Wei Adversarial Attention Perturbations for Large Object Detection Transformers
Zachary Yahn, Selim Furkan Tekin, Fatih Ilhan, Sihao Hu, Tiansheng Huang, Yichang Xu, Margaret Loper, Ling Liu Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis
Yanzuo Lu, Yuxi Ren, Xin Xia, Shanchuan Lin, Xing Wang, Xuefeng Xiao, Andy J. Ma, Xiaohua Xie, Jian-Huang Lai Adversarial Purification via Super-Resolution and Diffusion
Mincheol Park, Cheonjun Park, Seungseop Lim, Mijin Koo, Hyunwuk Lee, Won Woo Ro, Suhyun Kim Adversarial Robust Memory-Based Continual Learner
Xiaoyue Mi, Fan Tang, Zonghan Yang, Danding Wang, Juan Cao, Peng Li, Yang Liu Adversarial Training for Probabilistic Robustness
Yi Zhang, Yuhang Chen, Zhen Chen, Wenjie Ruan, Xiaowei Huang, Siddartha Khastgir, Xingyu Zhao AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
Junli Liu, Qizhi Chen, Zhigang Wang, Yiwen Tang, Yiting Zhang, Chi Yan, Dong Wang, Xuelong Li, Bin Zhao Aether: Geometric-Aware Unified World Modeling
Haoyi Zhu, Yifan Wang, Jianjun Zhou, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Chunhua Shen, Jiangmiao Pang, Tong He AGO: Adaptive Grounding for Open World 3D Occupancy Prediction
Peizheng Li, Shuxiao Ding, You Zhou, Qingwen Zhang, Onat Inak, Larissa Triess, Niklas Hanselmann, Marius Cordts, Andreas Zell Agreement Aware and Dissimilarity Oriented GLOM
Ru Zeng, Yan Song, Yang Zhang, Yanling Hu, Hui Yu AgroBench: Vision-Language Model Benchmark in Agriculture
Risa Shinoda, Nakamasa Inoue, Hirokatsu Kataoka, Masaki Onishi, Yoshitaka Ushiku AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Ziyin Zhou, Yunpeng Luo, Yuanchen Wu, Ke Sun, Jiayi Ji, Ke Yan, Shouhong Ding, Xiaoshuai Sun, Yunsheng Wu, Rongrong Ji AIRA: Activation-Informed Low-Rank Adaptation for Large Models
Lujun Li, Dezhi Li, Cheng Lin, Wei Li, Wei Xue, Sirui Han, Yike Guo AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion
Liuyue Xie, Jiancong Guo, Ozan Cakmakci, Andre Araujo, László A. Jeni, Zhiheng Jia AlignGuard: Scalable Safety Alignment for Text-to-Image Generation
Runtao Liu, I Chieh Chen, Jindong Gu, Jipeng Zhang, Renjie Pi, Qifeng Chen, Philip Torr, Ashkan Khakzar, Fabio Pizzati Aligning Constraint Generation with Design Intent in Parametric CAD
Evan Casey, Tianyu Zhang, Shu Ishida, John Roger Thompson, Amir Khasahmadi, Joseph George Lambourne, Pradeep Kumar Jayaraman, Karl D.D. Willis Aligning Effective Tokens with Video Anomaly in Large Language Models
Yingxian Chen, Jiahui Liu, Ruidi Fan, Yanwei Li, Chirui Chang, Shizhen Zhao, Wilton W. T. Fok, Xiaojuan Qi, Yik-Chung Wu Aligning Global Semantics and Local Textures in Generative Video Enhancement
Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei Aligning Moments in Time Using Video Queries
Yogesh Kumar, Uday Agarwal, Manish Gupta, Anand Mishra Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu, Siyuan Meng, Yanting Gao, Song Mao, Pinlong Cai, Guohang Yan, Yirong Chen, Zilin Bian, Ding Wang, Botian Shi All in One: Visual-Description-Guided Unified Point Cloud Segmentation
Zongyan Han, Mohamed El Amine Boudjoghra, Jiahua Dong, Jinhong Wang, Rao Muhammad Anwer AllGCD: Leveraging All Unlabeled Data for Generalized Category Discovery
Xinzi Cao, Ke Chen, Feidiao Yang, Xiawu Zheng, Yonghong Tian, Yutong Lu AllTracker: Efficient Dense Point Tracking at High Resolution
Adam W. Harley, Yang You, Xinglong Sun, Yang Zheng, Nikhil Raghuraman, Yunqi Gu, Sheldon Liang, Wen-Hsuan Chu, Achal Dave, Suya You, Rares Ambrus, Katerina Fragkiadaki, Leonidas Guibas ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions
Dubing Chen, Jin Fang, Wencheng Han, Xinjing Cheng, Junbo Yin, Chengzhong Xu, Fahad Shahbaz Khan, Jianbing Shen Always Skip Attention
Yiping Ji, Hemanth Saratchandran, Peyman Moghadam, Simon Lucey AM-Adapter: Appearance Matching Adapter for Exemplar-Based Semantic Image Synthesis In-the-Wild
Siyoon Jin, Jisu Nam, Jiyoung Kim, Dahyun Chung, Yeong-Seok Kim, Joonhyung Park, Heonjeong Chu, Seungryong Kim Amodal Depth Anything: Amodal Depth Estimation in the Wild
Zhenyu Li, Mykola Lavreniuk, Jian Shi, Shariq Farooq Bhat, Peter Wonka Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images
Tianhao Wu, Chuanxia Zheng, Frank Guan, Andrea Vedaldi, Tat-Jen Cham An Efficient Hybrid Vision Transformer for TinyML Applications
Fanhong Zeng, Huanan Li, Juntao Guan, Rui Fan, Tong Wu, Xilong Wang, Rui Lai An Empirical Study of Autoregressive Pre-Training from Videos
Jathushan Rajasegaran, Ilija Radosavovic, Rahul Ravishankar, Yossi Gandelsman, Christoph Feichtenhofer, Jitendra Malik An Information-Theoretic Regularizer for Lossy Neural Image Compression
Yingwen Zhang, Meng Wang, Xihua Sheng, Peilin Chen, Junru Li, Li Zhang, Shiqi Wang An Inversion-Based Measure of Memorization for Diffusion Models
Zhe Ma, Qingming Li, Xuhong Zhang, Tianyu Du, Ruixiao Lin, Zonghui Wang, Shouling Ji, Wenzhi Chen An OpenMind for 3D Medical Vision Self-Supervised Learning
Tassilo Wald, Constantin Ulrich, Jonathan Suprijadi, Sebastian Ziegler, Michal Nohel, Robin Peretzke, Gregor Kohler, Klaus Maier-Hein Analyzing Finetuning Representation Shift for Multimodal LLMs Steering
Pegah Khayatan, Mustafa Shukor, Jayneel Parekh, Arnaud Dapogny, Matthieu Cord AnimalClue: Recognizing Animals by Their Traces
Risa Shinoda, Nakamasa Inoue, Iro Laina, Christian Rupprecht, Hirokatsu Kataoka Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance
Li Hu, Guangyuan Wang, Zhen Shen, Xin Gao, Dechao Meng, Lian Zhuo, Peng Zhang, Bang Zhang, Liefeng Bo Anti-Tamper Protection for Unauthorized Individual Image Generation
Zelin Li, Ruohan Zong, Yifan Liu, Ruichen Yao, Yaokun Liu, Yang Zhang, Dong Wang Any-SSR: How Recursive Least Squares Works in Continual Learning of Large Language Model
Kai Tong, Kang Pan, Xiao Zhang, Erli Meng, Run He, Yawen Cui, Nuoyan Guo, Huiping Zhuang AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation
Guanxing Lu, Tengbo Yu, Haoyuan Deng, Season Si Chen, Yansong Tang, Ziwei Wang AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction
Xuying Zhang, Yupeng Zhou, Kai Wang, Yikai Wang, Zhen Li, Shaohui Jiao, Daquan Zhou, Qibin Hou, Ming-Ming Cheng Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs
Yikang Zhou, Tao Zhang, Shilin Xu, Shihao Chen, Qianyu Zhou, Yunhai Tong, Shunping Ji, Jiangning Zhang, Lu Qi, Xiangtai Li ARGUS: Hallucination and Omission Evaluation in Video-LLMs
Ruchit Rawal, Reza Shirkavand, Heng Huang, Gowthami Somepalli, Tom Goldstein ARMO: Autoregressive Rigging for Multi-Category Objects
Mingze Sun, Shiwei Mao, Keyi Chen, Yurun Chen, Shunlin Lu, Jingbo Wang, Junting Dong, Ruqi Huang ART: Adaptive Relation Tuning for Generalized Relation Prediction
Gopika Sudhakaran, Hikaru Shindo, Patrick Schramowski, Simone Schaub-Meyer, Kristian Kersting, Stefan Roth Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description
Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech, Xi Wang, Luc Van Gool, Danda Pani Paudel ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling
Jinhyung Park, Javier Romero, Shunsuke Saito, Fabian Prada, Takaaki Shiratori, Yichen Xu, Federica Bogo, Shoou-I Yu, Kris Kitani, Rawal Khirodkar Augmented Mass-Spring Model for Real-Time Dense Hair Simulation
J. H. Alejandro Amador, Yi Zhou, Xin Sun, Zhixin Shu, Chengan He, Soren Pirk, Dominik L. Michels AURELIA: Test-Time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury, Hanan Gani, Nishit Anand, Sayan Nag, Ruohan Gao, Mohamed Elhoseiny, Salman Khan, Dinesh Manocha Authentic 4D Driving Simulation with a Video Generation Model
Lening Wang, Wenzhao Zheng, Dalong Du, Yunpeng Zhang, Yilong Ren, Han Jiang, Zhiyong Cui, Haiyang Yu, Jie Zhou, Shanghang Zhang Auto-Regressively Generating Multi-View Consistent Images
JiaKui Hu, Yuxiao Yang, Jialun Liu, Jinbo Wu, Chen Zhao, Yanye Lu Auto-Vocabulary Semantic Segmentation
Osman Ülger, Maksymilian Kulicki, Yuki Asano, Martin R. Oswald AutoScape: Geometry-Consistent Long-Horizon Scene Generation
Jiacheng Chen, Ziyu Jiang, Mingfu Liang, Bingbing Zhuang, Jong-Chyi Su, Sparsh Garg, Ying Wu, Manmohan Chandraker AV-Flow: Transforming Text to Audio-Visual Human-like Interactions
Aggelina Chatziagapi, Louis-Philippe Morency, Hongyu Gong, Michael Zollhöfer, Dimitris Samaras, Alexander Richard AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Alper Canberk, Kwot Sin Lee, Vicente Ordonez, Sergey Tulyakov Avat3r: Large Animatable Gaussian Reconstruction Model for High-Fidelity 3D Head Avatars
Tobias Kirschstein, Javier Romero, Artem Sevastopolsky, Matthias Nießner, Shunsuke Saito AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta, Yaoting Wang, Mohamed Elhoseiny, Ruohan Gao, Dinesh Manocha B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens
Zhuqiang Lu, Zhenfei Yin, Mengwei He, Zhihui Wang, Zicheng Liu, Zhiyong Wang, Kun Hu BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
Shengao Wang, Arjun Chandra, Aoming Liu, Venkatesh Saligrama, Boqing Gong Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction
Weirong Chen, Ganlin Zhang, Felix Wimbauer, Rui Wang, Nikita Araslanov, Andrea Vedaldi, Daniel Cremers Backdoor Defense via Enhanced Splitting and Trap Isolation
Hongrui Yu, Lu Qi, Wanyu Lin, Jian Chen, Hailong Sun, Chengbin Sun BadVideo: Stealthy Backdoor Attack Against Text-to-Video Generation
Ruotong Wang, Mingli Zhu, Jiarong Ou, Rui Chen, Xin Tao, Pengfei Wan, Baoyuan Wu Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-Stage Image-to-3D Generation and Reconstruction
Yuanhao Cai, He Zhang, Kai Zhang, Yixun Liang, Mengwei Ren, Fujun Luan, Qing Liu, Soo Ye Kim, Jianming Zhang, Zhifei Zhang, Yuqian Zhou, Yulun Zhang, Xiaokang Yang, Zhe Lin, Alan Yuille Balanced Image Stylization with Style Matching Score
Yuxin Jiang, Liming Jiang, Shuai Yang, Jia-Wei Liu, Ivor W. Tsang, Mike Zheng Shou BANet: Bilateral Aggregation Network for Mobile Stereo Matching
Gangwei Xu, Jiaxin Liu, Xianqi Wang, Junda Cheng, Yong Deng, Jinliang Zang, Yurui Chen, Xin Yang BATCLIP: Bimodal Online Test-Time Adaptation for CLIP
Sarthak Maharana, Baoming Zhang, Leonid Karlinsky, Rogerio Feris, Yunhui Guo Benchmarking Egocentric Visual-Inertial SLAM at City Scale
Anusha Krishnan, Shaohui Liu, Paul-Edouard Sarlin, Oscar Gentilhomme, David Caruso, Maurizio Monge, Richard Newcombe, Jakob Engel, Marc Pollefeys Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Minghe Gao, Xuqi Liu, Zhongqi Yue, Yang Wu, Shuang Chen, Juncheng Li, Siliang Tang, Fei Wu, Tat-Seng Chua, Yueting Zhuang Beyond [cls]: Exploring the True Potential of Masked Image Modeling Representations
Marcin Przewięźlikowski, Randall Balestriero, Wojciech Jasiński, Marek Śmieja, Bartosz Zieliński Beyond Blur: A Fluid Perspective on Generative Diffusion Models
Grzegorz Gruszczynski, Jakub Meixner, Michal Wlodarczyk, Przemyslaw Musialski Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation
Gang Dai, Yifan Zhang, Yutao Qin, Qiangya Guo, Shuangping Huang, Shuicheng Yan Beyond Losses Reweighting: Empowering Multi-Task Learning via the Generalization Perspective
Hoang Phan, Lam Tran, Quyen Tran, Ngoc Tran, Tuan Truong, Qi Lei, Nhat Ho, Dinh Phung, Trung Le Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Sucheng Ren, Qihang Yu, Ju He, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen Beyond Perspective: Neural 360-Degree Video Compression
Andy Regensky, Marc Windsheimer, Fabian Brand, Andre Kaup Beyond Pixel Uncertainty: Bounding the OoD Objects in Road Scenes
Huachao Zhu, Zelong Liu, Zhichao Sun, Yuda Zou, Gui-Song Xia, Yongchao Xu Beyond RGB: Adaptive Parallel Processing for RAW Object Detection
Shani Gamrian, Hila Barel, Feiran Li, Masakazu Yoshimura, Daisuke Iso Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
Omkar Thawakar, Dmitry Demidov, Ritesh Thawkar, Rao Muhammad Anwer, Mubarak Shah, Fahad Shahbaz Khan, Salman Khan Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection
Ji Du, Xin Wang, Fangwei Hao, Mingyang Yu, Chunyuan Chen, Jiesheng Wu, Bin Wang, Jing Xu, Ping Li Beyond Spatial Frequency: Pixel-Wise Temporal Frequency-Based Deepfake Video Detection
Taehoon Kim, Jongwook Choi, Yonghyun Jeong, Haeun Noh, Jaejun Yoo, Seungryul Baek, Jongwon Choi Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Qizhe Zhang, Aosong Cheng, Ming Lu, Renrui Zhang, Zhiyong Zhuo, Jiajun Cao, Shaobo Guo, Qi She, Shanghang Zhang Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
Kaixuan Jiang, Yang Liu, Weixing Chen, Jingzhou Luo, Ziliang Chen, Ling Pan, Guanbin Li, Liang Lin Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Yiming Zhang, Zhuokai Zhao, Zhaorun Chen, Zenghui Ding, Xianjun Yang, Yining Sun Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Yusuke Hirota, Ryo Hachiuma, Boyi Li, Ximing Lu, Michael Ross Boone, Boris Ivanovic, Yejin Choi, Marco Pavone, Yu-Chiang Frank Wang, Noa Garcia, Yuta Nakashima, Chao-Han Huck Yang Blind Noisy Image Deblurring Using Residual Guidance Strategy
Heyan Liu, Jianing Sun, Jun Liu, Xi-Le Zhao, Tingting Wu, Tieyong Zeng Blind Video Super-Resolution Based on Implicit Kernels
Qiang Zhu, Yuxuan Jiang, Shuyuan Zhu, Fan Zhang, David Bull, Bing Zeng BlinkTrack: Feature Tracking over 80 FPS via Events and Images
Yichen Shen, Yijin Li, Shuo Chen, Guanglin Li, Zhaoyang Huang, Hujun Bao, Zhaopeng Cui, Guofeng Zhang BlueNeg: A 35mm Negative Film Dataset for Restoring Channel-Heterogeneous Deterioration
Hanyuan Liu, Chengze Li, Minshan Xie, Zhenni Wang, Jiawen Liang, Chi-Sing Leung, Tien-Tsin Wong BokehDiff: Neural Lens Blur with One-Step Diffusion
Chengxuan Zhu, Qingnan Fan, Qi Zhang, Jinwei Chen, Huaqi Zhang, Chao Xu, Boxin Shi Bokehlicious: Photorealistic Bokeh Rendering with Controllable Apertures
Tim Seizinger, Florin-Alexandru Vasluianu, Marcos V. Conde, Zongwei Wu, Radu Timofte Bolt3D: Generating 3D Scenes in Seconds
Stanislaw Szymanowicz, Jason Y. Zhang, Pratul Srinivasan, Ruiqi Gao, Arthur Brussee, Aleksander Holynski, Ricardo Martin-Brualla, Jonathan T. Barron, Philipp Henzler Boost 3D Reconstruction Using Diffusion-Based Monocular Camera Calibration
Junyuan Deng, Wei Yin, Xiaoyang Guo, Qian Zhang, Xiaotao Hu, Weiqiang Ren, Xiao-Xiao Long, Ping Tan Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
Qihan Huang, Weilong Dai, Jinlong Liu, Wanggui He, Hao Jiang, Mingli Song, Jingyuan Chen, Chang Yao, Jie Song Boosting Multi-View Indoor 3D Object Detection via Adaptive 3D Volume Construction
Runmin Zhang, Zhu Yu, Si-Yuan Cao, Lingyu Zhu, Guangyi Zhang, Xiaokai Bai, Hui-Liang Shen Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-Language Pre-Training
Weiwei Cao, Jianpeng Zhang, Zhongyi Shui, Sinuo Wang, Zeli Chen, Xi Li, Le Lu, Xianghua Ye, Qi Zhang, Tingbo Liang, Ling Zhang Bootstrap3D: Improving Multi-View Diffusion Model with Synthetic Data
Zeyi Sun, Tong Wu, Pan Zhang, Yuhang Zang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation
Yuanhong Yu, Xingyi He, Chen Zhao, Junhao Yu, Jiaqi Yang, Ruizhen Hu, Yujun Shen, Xing Zhu, Xiaowei Zhou, Sida Peng Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Yuqing Wang, Zhijie Lin, Yao Teng, Yuanzhi Zhu, Shuhuai Ren, Jiashi Feng, Xihui Liu Bridging Local Inductive Bias and Long-Range Dependencies with Pixel-Mamba for End-to-End Whole Slide Image Analysis
Zhongwei Qiu, Hanqing Chao, Tiancheng Lin, Wanxing Chang, Zijiang Yang, Wenpei Jiao, Yixuan Shen, Yunshuo Zhang, Yelin Yang, Wenbin Liu, Hui Jiang, Yun Bian, Ke Yan, Dakai Jin, Le Lu Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation
Hiroyasu Akada, Jian Wang, Vladislav Golyanik, Christian Theobalt Bringing RNNs Back to Efficient Open-Ended Video Understanding
Weili Xu, Enxin Song, Wenhao Chai, Xuexiang Wen, Tian Ye, Gaoang Wang C4D: 4D Made from 3D Through Dual Correspondences
Shizun Wang, Zhenxiang Jiang, Xingyi Yang, Xinchao Wang CA-I2P: Channel-Adaptive Registration Network with Global Optimal Selection
Zhixin Cheng, Jiacheng Deng, Xinjun Li, Xiaotian Yin, Bohao Liao, Baoqun Yin, Wenfei Yang, Tianzhu Zhang CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers
Dimitrios Mallis, Ahmet Serda Karadeniz, Sebastian Cavada, Danila Rukhovich, Niki Foteinopoulou, Kseniya Cherenkova, Anis Kacem, Djamila Aouada CAD-Recode: Reverse Engineering CAD Code from Point Clouds
Danila Rukhovich, Elona Dupont, Dimitrios Mallis, Kseniya Cherenkova, Anis Kacem, Djamila Aouada CAFA: A Controllable Automatic Foley Artist
Roi Benita, Michael Finkelson, Tavi Halperin, Gleb Sterkin, Yossi Adi CameraCtrl II: Dynamic Scene Exploration via Camera-Controlled Video Diffusion Models
Hao He, Ceyuan Yang, Shanchuan Lin, Yinghao Xu, Meng Wei, Liangke Gui, Qi Zhao, Gordon Wetzstein, Lu Jiang, Hongsheng Li Can3Tok: Canonical 3D Tokenization and Latent Modeling of Scene-Level 3D Gaussians
Quankai Gao, Iliyan Georgiev, Tuanfeng Y. Wang, Krishna Kumar Singh, Ulrich Neumann, Jae Shin Yoon CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation
Xiangyang Luo, Ye Zhu, Yunfei Liu, Lijian Lin, Cong Wan, Zijian Cai, Yu Li, Shao-Lun Huang CaO2: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation
Haoxuan Wang, Zhenghao Zhao, Junyi Wu, Yuzhang Shang, Gaowen Liu, Yan Yan CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
Kuniaki Saito, Donghyun Kim, Kwanyong Park, Atsushi Hashimoto, Yoshitaka Ushiku CarGait: Cross-Attention Based Re-Ranking for Gait Recognition
Gavriel Habib, Noa Barzilay, Or Shimshi, Rami Ben-Ari, Nir Darshan CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction
Zhefei Gong, Pengxiang Ding, Shangke Lyu, Siteng Huang, Mingyang Sun, Wei Zhao, Zhaoxin Fan, Donglin Wang CasP: Improving Semi-Dense Feature Matching Pipeline Leveraging Cascaded Correspondence Priors for Guidance
Peiqi Chen, Lei Yu, Yi Wan, Yingying Pei, Xinyi Liu, Yongxiang Yao, Yingying Zhang, Lixiang Ru, Liheng Zhong, Jingdong Chen, Ming Yang, Yongjun Zhang Cassic: Towards Content-Adaptive State-Space Models for Learned Image Compression
Shiyu Qin, Jinpeng Wang, Yimin Zhou, Bin Chen, Tianci Luo, Baoyi An, Tao Dai, Shu-Tao Xia, Yaowei Wang CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from a Single-View Image
Wonseok Roh, Hwanhee Jung, Jong Wook Kim, Seunggwan Lee, Innfarn Yoo, Andreas Lugmayr, Seunggeun Chi, Karthik Ramani, Sangpil Kim Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis
Lei-Lei Li, Jianwu Fang, Junbin Xiao, Shanmin Pang, Hongkai Yu, Chen Lv, Jianru Xue, Tat-Seng Chua CAVIS: Context-Aware Video Instance Segmentation
Seunghun Lee, Jiwan Seo, Kiljoon Han, Minwoo Choi, Sunghoon Im CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy
Zhibo Yang, Jun Tang, Zhaohai Li, Pengfei Wang, Jianqiang Wan, Humen Zhong, Xuejing Liu, Mingkun Yang, Peng Wang, Shuai Bai, Lianwen Jin, Junyang Lin CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting
Lei Tian, Xiaomin Li, Liqian Ma, Hao Yin, Zirui Zheng, Hefei Huang, Taiqing Li, Huchuan Lu, Xu Jia Certifiably Optimal Anisotropic Rotation Averaging
Carl Olsson, Yaroslava Lochman, Johan Malmport, Christopher Zach CF3: Compact and Fast 3D Feature Fields
Hyunjoon Lee, Joonkyu Min, Jaesik Park CharaConsist: Fine-Grained Consistent Character Generation
Mengyu Wang, Henghui Ding, Jianing Peng, Yao Zhao, Yunpeng Chen, Yunchao Wei CHARM3R: Towards Unseen Camera Height Robust Monocular 3D Detector
Abhinav Kumar, Yuliang Guo, Zhihao Zhang, Xinyu Huang, Liu Ren, Xiaoming Liu ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning
Zhengzhuo Xu, SiNan Du, Yiyan Qi, Siwen Lu, Chengjin Xu, Chun Yuan, Jian Guo Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng, Mingsheng Li, Jiakang Yuan, Hongbin Zhou, Renqiu Xia, Renrui Zhang, Lei Bai, Song Mao, Bin Wang, Aojun Zhou, Botian Shi, Tao Chen, Bo Zhang, Xiangyu Yue CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image
Arindam Dutta, Meng Zheng, Zhongpai Gao, Benjamin Planche, Anwesa Choudhuri, Terrence Chen, Amit K. Roy-Chowdhury, Ziyan Wu CIARD: Cyclic Iterative Adversarial Robustness Distillation
Liming Lu, Shuchao Pang, Xu Zheng, Xiang Gu, Anan Du, Yunhuai Liu, Yongbin Zhou CityNav: A Large-Scale Dataset for Real-World Aerial Navigation
Jungdae Lee, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Yutaka Matsuo, Nakamasa Inoue CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization
Jan Ackermann, Jonas Kulhanek, Shengqu Cai, Haofei Xu, Marc Pollefeys, Gordon Wetzstein, Leonidas J. Guibas, Songyou Peng CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Xiao Lin, Yun Peng, Liuyi Wang, Xianyou Zhong, Minghao Zhu, Yi Feng, Jingwei Yang, Chengju Liu, Qijun Chen ClearSight: Human Vision-Inspired Solutions for Event-Based Motion Deblurring
Xiaopeng Lin, Yulong Huang, Hongwei Ren, Zunchang Liu, Hongxiang Huang, Yue Zhou, Haotian Fu, Bojun Cheng Clink! Chop! Thud! - Learning Object Sounds from Real-World Interactions
Mengyu Yang, Yiming Chen, Haozheng Pei, Siddhant Agarwal, Arun Balajee Vasudevan, James Hays CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation
Jiannan Ge, Lingxi Xie, Hongtao Xie, Pandeng Li, Sun-Ao Liu, Xiaopeng Zhang, Qi Tian, Yongdong Zhang CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
Siyu Jiao, Haoye Dong, Yuyang Yin, Zequn Jie, Yinlong Qian, Yao Zhao, Humphrey Shi, Yunchao Wei CMB-ML: A Cosmic Microwave Background Dataset for the Oldest Possible Computer Vision Task
James Amato, Yunan Xie, Leonel Medina-Varela, Ammar Aljerwi, Adam McCutcheon, T. Seth Rippentrop, Kristian Gonzalez, Jacques Delabrouille, Mustapha Ishak, Nicholas Ruozzi CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
Jianyu Wu, Yizhou Wang, Xiangyu Yue, Xinzhu Ma, Jinyang Guo, Dongzhan Zhou, Wanli Ouyang, Shixiang Tang CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts
Olaf Dünkel, Artur Jesslen, Jiahao Xie, Christian Theobalt, Christian Rupprecht, Adam Kortylewski CO2-Net: A Physics-Informed Spatio-Temporal Model for Global Surface CO2 Reconstruction
Hao Zheng, Yuting Zheng, Hanbo Huang, Chaofan Sun, Enhui Liao, Lin Liu, Yi Han, Hao Zhou, Shiyu Liang CoA-VLA: Improving Vision-Language-Action Models via Visual-Text Chain-of-Affordance
Jinming Li, Yichen Zhu, Zhibin Tang, Junjie Wen, Minjie Zhu, Xiaoyu Liu, Chengmeng Li, Ran Cheng, Yaxin Peng, Yan Peng, Feifei Feng CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving
Rui Song, Chenwei Liang, Yan Xia, Walter Zimmer, Hu Cao, Holger Caesar, Andreas Festag, Alois Knoll CODA: Repurposing Continuous VAEs for Discrete Tokenization
Zeyu Liu, Zanlin Ni, Yeguo Hua, Xin Deng, Xiao Ma, Cheng Zhong, Gao Huang CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs
Yihan Cao, Jiazhao Zhang, Zhinan Yu, Shuzhen Liu, Zheng Qin, Qin Zou, Bo Du, Kai Xu CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation
Zhuoyan Luo, Yinghao Wu, Tianheng Cheng, Yong Liu, Yicheng Xiao, Hongfa Wang, Xiao-Ping Zhang, Yujiu Yang COIN: Confidence Score-Guided Distillation for Annotation-Free Cell Segmentation
Sanghyun Jo, Seo Jin Lee, Seungwoo Lee, Seohyung Hong, Hyungseok Seo, Kyungsu Kim Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues
Francesco Taioli, Edoardo Zorzi, Gianni Franchi, Alberto Castellini, Alessandro Farinelli, Marco Cristani, Yiming Wang Color Matching Using Hypernetwork-Based Kolmogorov-Arnold Networks
Artem Nikonorov, Georgy Perevozchikov, Andrei Korepanov, Nancy Mehta, Mahmoud Afifi, Egor Ershov, Radu Timofte CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games
Peng Chen, Pi Bu, Yingyao Wang, Xinyi Wang, Ziming Wang, Jie Guo, Yingxiu Zhao, Qi Zhu, Jun Song, Siran Yang, Jiamang Wang, Bo Zheng Combinative Matching for Geometric Shape Assembly
Nahyuk Lee, Juhong Min, Junhong Lee, Chunghyun Park, Minsu Cho CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images
Jungho Lee, Donghyeong Kim, Dogyoon Lee, Suhwan Cho, Minhyeok Lee, Wonjoon Lee, Taeoh Kim, Dongyoon Wee, Sangyoun Lee CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models
Gaoyang Zhang, Bingtao Fu, Qingnan Fan, Qi Zhang, Runxing Liu, Hong Gu, Huaqi Zhang, Xinguo Liu CompCap: Improving Multimodal Large Language Models with Composite Captions
Xiaohui Chen, Satya Narayan Shukla, Mahmoud Azab, Aashu Singh, Qifan Wang, David Yang, ShengYun Peng, Hanchao Yu, Shen Yan, Xuewen Zhang, Baosheng He CompleteMe: Reference-Based Human Image Completion
Yu-Ju Tsai, Brian Price, Qing Liu, Luis Figueroa, Daniil Pakhomov, Zhihong Ding, Scott Cohen, Ming-Hsuan Yang CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation
Zixin Zhu, Kevin Duarte, Mamshad Nayeem Rizve, Chengyuan Xu, Ratheesh Kalarot, Junsong Yuan Consensus-Driven Active Model Selection
Justin Kay, Grant Van Horn, Subhransu Maji, Daniel Sheldon, Sara Beery Consistency Trajectory Matching for One-Step Generative Super-Resolution
Weiyi You, Mingyang Zhang, Leheng Zhang, Xingyu Zhou, Kexuan Shi, Shuhang Gu Constraint-Aware Feature Learning for Parametric Point Cloud
Xi Cheng, Ruiqi Lei, Di Huang, Zhichao Liao, Fengyuan Piao, Yan Chen, Pingfa Feng, Long Zeng ConstStyle: Robust Domain Generalization with Unified Style Transformation
Nam Duong Tran, Nam Nguyen Phuong, Hieu H. Pham, Phi Le Nguyen, My T. Thai Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing
Maria-Paola Forte, Nikos Athanasiou, Giulia Ballardini, Jan Ulrich Bartels, Katherine J. Kuchenbecker, Michael J. Black Context-Aware Academic Emotion Dataset and Benchmark
Luming Zhao, Jingwen Xuan, Jiamin Lou, Yonghui Yu, Wenwu Yang Continual Personalization for Diffusion Models
Yu-Chien Liao, Jr-Jen Chen, Chi-Pin Huang, Ci-Siang Lin, Meng-Lin Wu, Yu-Chiang Frank Wang Continuous-Time Human Motion Field from Event Cameras
Ziyun Wang, Ruijun Zhang, Zi-Yan Liu, Yufu Wang, Kostas Daniilidis ContraGS: Codebook-Condensed and Trainable Gaussian Splatting for Fast, Memory-Efficient Reconstruction
Sankeerth Durvasula, Sharanshangar Muhunthan, Zain Moustafa, Richard Chen, Ruofan Liang, Yushi Guan, Nilesh Ahuja, Nilesh Jain, Selvakumar Panneer, Nandita Vijaykumar Contrastive Flow Matching
George Stoica, Vivek Ramanujan, Xiang Fan, Ali Farhadi, Ranjay Krishna, Judy Hoffman Controllable 3D Outdoor Scene Generation via Scene Graphs
Yuheng Liu, Xinke Li, Yuning Zhang, Lu Qi, Xin Li, Wenping Wang, Chongshou Li, Xueting Li, Ming-Hsuan Yang Controllable and Expressive One-Shot Video Head Swapping
Chaonan Ji, Jinwei Qi, Peng Zhang, Bang Zhang, Liefeng Bo Controllable Feature Whitening for Hyperparameter-Free Bias Mitigation
Yooshin Cho, Hanbyel Cho, Janghyeon Lee, HyeongGwon Hong, Jaesung Ahn, Junmo Kim Controllable Latent Space Augmentation for Digital Pathology
Sofiène Boutaj, Marin Scalbert, Pierre Marza, Florent Couzinie-Devy, Maria Vakalopoulou, Stergios Christodoulidis Controllable Weather Synthesis and Removal with Video Diffusion Models
Chih-Hao Lin, Zian Wang, Ruofan Liang, Yuxuan Zhang, Sanja Fidler, Shenlong Wang, Zan Gojcic Controlling Multimodal LLMs via Reward-Guided Decoding
Oscar Mañas, Pierluca D'Oro, Koustuv Sinha, Adriana Romero-Soriano, Michal Drozdzal, Aishwarya Agrawal Cooperative Pseudo Labeling for Unsupervised Federated Classification
Kuangpu Guo, Lijun Sheng, Yongcan Yu, Jian Liang, Zilei Wang, Ran He CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos
Nikita Karaev, Yuri Makarov, Jianyuan Wang, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht Counting Stacked Objects
Corentin Dumery, Noa Etté, Aoxiang Fan, Ren Li, Jingyi Xu, Hieu Le, Pascal Fua Cracking Instance Jigsaw Puzzles: An Alternative to Multiple Instance Learning for Whole Slide Image Analysis
Xiwen Chen, Peijie Qiu, Wenhui Zhu, Hao Wang, Huayu Li, Xuanzhao Dong, Xiaotong Sun, Xiaobing Yu, Yalin Wang, Abolfazl Razi, Aristeidis Sotiras Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs
Xinyu Fang, Zhijian Chen, Kai Lan, Lixin Ma, Shengyuan Ding, Yingji Liang, Xiangyu Zhao, Farong Wen, Zicheng Zhang, Guofeng Zhang, Haodong Duan, Kai Chen, Dahua Lin Cross-Subject Mind Decoding from Inaccurate Representations
Yangyang Xu, Bangzhen Liu, Wenqi Shao, Yong Du, Shengfeng He, Tingting Zhu CryoFastAR: Fast Cryo-EM Ab Initio Reconstruction Made Easy
Jiakai Zhang, Shouchen Zhou, Haizhao Dai, Xinhang Liu, Peihao Wang, Zhiwen Fan, Yuan Pei, Jingyi Yu CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling
Trong Thang Pham, Akash Awasthi, Saba Khan, Esteban Duran Marti, Tien-Phat Nguyen, Khoa Vo, Minh Tran, Son Nguyen, Cuong Tran, Yuki Ikebe, Anh Totti Nguyen, Anh Nguyen, Zhigang Deng, Carol C. Wu, Hien Nguyen, Ngan Le CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems
Aniket Rege, Zinnia Nie, Mahesh Ramesh, Unmesh Raskar, Zhuoran Yu, Aditya Kusupati, Yong Jae Lee, Ramya Korlakai Vinayak Curve-Aware Gaussian Splatting for 3D Parametric Curve Reconstruction
Zhirui Gao, Renjiao Yi, Yaqiao Dai, Xuening Zhu, Wei Chen, Chenyang Zhu, Kai Xu CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation
Leon Sick, Dominik Engel, Sebastian Hartwig, Pedro Hermosilla, Timo Ropinski CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection
Hanzhi Zhong, Zhiyu Xiang, Ruoyu Xu, Jingyun Fu, Peng Xu, Shaohong Wang, Zhihao Yang, Tianyu Pu, Eryun Liu CVPT: Cross Visual Prompt Tuning
Lingyun Huang, Jianxu Mao, Junfei Yi, Ziming Tao, Yaonan Wang CWNet: Causal Wavelet Network for Low-Light Image Enhancement
Tongshun Zhang, Pingping Liu, Yubing Lu, Mengen Cai, Zijian Zhang, Zhe Zhang, Qiuzhan Zhou D3: Training-Free AI-Generated Video Detection Using Second-Order Features
Chende Zheng, Ruiqi Suo, Chenhao Lin, Zhengyu Zhao, Le Yang, Shuai Liu, Minghui Yang, Cong Wang, Chao Shen DADM: Dual Alignment of Domain and Modality for Face Anti-Spoofing
Jingyi Yang, Xun Lin, Zitong Yu, Liepiao Zhang, Xin Liu, Hui Li, Xiaochen Yuan, Xiaochun Cao DAMap: Distance-Aware MapNet for High Quality HD mAP Construction
Jinpeng Dong, Chen Li, Yutong Lin, Jingwen Fu, Sanping Zhou, Nanning Zheng Dataset Distillation as Data Compression: A Rate-Utility Perspective
Youneng Bao, Yiping Liu, Zhuo Chen, Yongsheng Liang, Mu Li, Kede Ma Dataset Distillation via the Wasserstein Metric
Haoyang Liu, Yijiang Li, Tiancheng Xing, Peiran Wang, Vibhu Dalal, Luwei Li, Jingrui He, Haohan Wang Dataset Ownership Verification for Pre-Trained Masked Models
Yuechen Xie, Jie Song, Yicheng Shan, Xiaoyan Zhang, Yuanyu Wan, Shengxuming Zhang, Jiarui Duan, Mingli Song DAViD: Data-Efficient and Accurate Vision Models from Synthetic Data
Fatemeh Saleh, Sadegh Aliakbarian, Charlie Hewitt, Lohit Petikam, Xian Xiao, Antonio Criminisi, Thomas J. Cashman, Tadas Baltrusaitis DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer
Yecheng Wu, Han Cai, Junyu Chen, Zhuoyang Zhang, Enze Xie, Jincheng Yu, Junsong Chen, Jinyi Hu, Yao Lu, Song Han DCHM: Depth-Consistent Human Modeling for Multiview Detection
Jiahao Ma, Tianyu Wang, Miaomiao Liu, David Ahmedt-Aristizabal, Chuong Nguyen Debiased Teacher for Day-to-Night Domain Adaptive Object Detection
Yiming Cui, Liang Li, Haibing Yin, Yuhan Gao, Yaoqi Sun, Chenggang Yan Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate
Qidong Huang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Jiaqi Wang, Weiming Zhang, Nenghai Yu Decoupled Diffusion Sparks Adaptive Scene Generation
Yunsong Zhou, Naisheng Ye, William Ljungbergh, Tianyu Li, Jiazhi Yang, Zetong Yang, Hongzi Zhu, Christoffer Petersson, Hongyang Li Decoupled Multi-Predictor Optimization for Inference-Efficient Model Tuning
Liwei Luo, Shuaitengyuan Li, Dongwei Ren, Qilong Wang, Pengfei Zhu, Qinghua Hu DeepMesh: Auto-Regressive Artist-Mesh Creation with Reinforcement Learning
Ruowen Zhao, Junliang Ye, Zhengyi Wang, Guangce Liu, Yiwen Chen, Yikai Wang, Jun Zhu DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis
Yinqi Cai, Jichang Li, Zhaolun Li, Weikai Chen, Rushi Lan, Xi Xie, Xiaonan Luo, Guanbin Li Demeter: A Parametric Model of Crop Plant Morphology from the Real World
Tianhang Cheng, Albert J. Zhai, Evan Z. Chen, Rui Zhou, Yawen Deng, Zitong Li, Kejie Zhao, Janice Shiu, Qianyu Zhao, Yide Xu, Xinlei Wang, Yuan Shen, Sheng Wang, Lisa Ainsworth, Kaiyu Guan, Shenlong Wang Democratizing High-Fidelity Co-Speech Gesture Video Generation
Xu Yang, Shaoli Huang, Shenbo Xie, Xuelin Chen, Yifei Liu, Changxing Ding Dense Policy: Bidirectional Autoregressive Learning of Actions
Yue Su, Xinyu Zhan, Hongjie Fang, Han Xue, Hao-Shu Fang, Yong-Lu Li, Cewu Lu, Lixin Yang DepR: Depth Guided Single-View Scene Reconstruction with Instance-Level Diffusion
Qingcheng Zhao, Xiang Zhang, Haiyang Xu, Zeyuan Chen, Jianwen Xie, Yuan Gao, Zhuowen Tu Describe Anything: Detailed Localized Image and Video Captioning
Long Lian, Yifan Ding, Yunhao Ge, Sifei Liu, Hanzi Mao, Boyi Li, Marco Pavone, Ming-Yu Liu, Trevor Darrell, Adam Yala, Yin Cui Describe, Adapt and Combine: Empowering CLIP Encoders for Open-Set 3D Object Retrieval
Zhichuan Wang, Yang Zhou, Zhe Liu, Rui Yu, Song Bai, Yulong Wang, Xinwei He, Xiang Bai Describe, Don't Dictate: Semantic Image Editing with Natural Language Intent
En Ci, Shanyan Guan, Yanhao Ge, Yilin Zhang, Wei Li, Zhenyu Zhang, Jian Yang, Ying Tai Details Matter for Indoor Open-Vocabulary 3D Instance Segmentation
Sanghun Jung, Jingjing Zheng, Ke Zhang, Nan Qiao, Albert Y. C. Chen, Lu Xia, Chi Liu, Yuyin Sun, Xiao Zeng, Hsiang-Wei Huang, Byron Boots, Min Sun, Cheng-Hao Kuo Detect Anything 3D in the Wild
Hanxue Zhang, Haoran Jiang, Qingsong Yao, Yanan Sun, Renrui Zhang, Hao Zhao, Hongyang Li, Hongzi Zhu, Zetong Yang Deterministic Object Pose Confidence Region Estimation
Jinghao Wang, Zhang Li, Zi Wang, Banglei Guan, Yang Shang, Qifeng Yu DexH2R: A Benchmark for Dynamic Dexterous Grasping in Human-to-Robot Handover
Youzhuo Wang, Jiayi Ye, Chuyang Xiao, Yiming Zhong, Heng Tao, Hang Yu, Yumeng Liu, Jingyi Yu, Yuexin Ma DexVLG: Dexterous Vision-Language-Grasp Model at Scale
Jiawei He, Danshi Li, Xinqiang Yu, Zekun Qi, Wenyao Zhang, Jiayi Chen, Zhaoxiang Zhang, Zhizheng Zhang, Li Yi, He Wang DH-FaceVid-1k: A Large-Scale High-Quality Dataset for Face Video Generation
Donglin Di, He Feng, Wenzhang Sun, Yongjia Ma, Hao Li, Wei Chen, Lei Fan, Tonghua Su, Xun Yang DialNav: Multi-Turn Dialog Navigation with a Remote Guide
Leekyeung Han, Hyunji Min, Gyeom Hwangbo, Jonghyun Choi, Paul Hongsuck Seo DICE: Staleness-Centric Optimizations for Parallel Diffusion MoE Inference
Jiajun Luo, Lizhuo Luo, Jianru Xu, Jiajun Song, Rongwei Lu, Chen Tang, Zhi Wang DictAS: A Framework for Class-Generalizable Few-Shot Anomaly Segmentation via Dictionary Lookup
Zhen Qu, Xian Tao, Xinyi Gong, ShiChen Qu, Xiaopei Zhang, Xingang Wang, Fei Shen, Zhengtao Zhang, Mukesh Prasad, Guiguang Ding Diff2I2P: Differentiable Image-to-Point Cloud Registration with Diffusion Prior
Juncheng Mu, Chengwei Ren, Weixiang Zhang, Liang Pan, Xiao-Ping Zhang, Yue Gao DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Yiyang Wang, Xi Chen, Xiaogang Xu, Sihui Ji, Yu Liu, Yujun Shen, Hengshuang Zhao Differential-Informed Sample Selection Accelerates Multimodal Contrastive Learning
Zihua Zhao, Feng Hong, Mengxi Chen, Pengyi Chen, Benyuan Liu, Jiangchao Yao, Ya Zhang, Yanfeng Wang Differentially Private Fine-Tuning of Diffusion Models
Yu-Lin Tsai, Yizhe Li, Chia-Mu Yu, Xuebin Ren, Po-Yu Chen, Zekai Chen, Francois Buet-Golfouse DiffIP: Representation Fingerprints for Robust IP Protection of Diffusion Models
Zhuoling Li, Haoxuan Qu, Jason Kuen, Jiuxiang Gu, Qiuhong Ke, Jun Liu, Hossein Rahmani DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes
Zonglin Di, Jing Shi, Yifei Fan, Hao Tan, Alexander Black, John Collomosse, Yang Liu Diffusion Image Prior
Hamadi Chihaoui, Paolo Favaro Diffusion-Based Imaginative Coordination for Bimanual Manipulation
Huilin Xu, Jian Ding, Jiakun Xu, Ruixiang Wang, Jun Chen, Jinjie Mai, Yanwei Fu, Bernard Ghanem, Feng Xu, Mohamed Elhoseiny DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion
Wenqiang Sun, Shuo Chen, Fangfu Liu, Zilong Chen, Yueqi Duan, Jun Zhu, Jun Zhang, Yikai Wang DIMO: Diverse 3D Motion Generation for Arbitrary Objects
Linzhan Mou, Jiahui Lei, Chen Wang, Lingjie Liu, Kostas Daniilidis Diorama: Unleashing Zero-Shot Single-View 3D Indoor Scene Modeling
Qirui Wu, Denys Iliash, Daniel Ritchie, Manolis Savva, Angel X. Chang DIP: Unsupervised Dense In-Context Post-Training of Visual Representations
Sophia Sirko-Galouchenko, Spyros Gidaris, Antonin Vobecky, Andrei Bursuc, Nicolas Thome Discontinuity-Aware Normal Integration for Generic Central Camera Models
Francesco Milano, Manuel López-Antequera, Naina Dhingra, Roland Siegwart, Robert Thiel DisCoPatch: Taming Adversarially-Driven Batch Statistics for Improved Out-of-Distribution Detection
Francisco Caetano, Christiaan Viviers, Luis A. Zavala-Mondragón, Peter H.N. De With, Fons van der Sommen DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding
Jungbin Cho, Junwan Kim, Jisoo Kim, Minseo Kim, Mingu Kang, Sungeun Hong, Tae-Hyun Oh, Youngjae Yu Discovering Divergent Representations Between Text-to-Image Models
Lisa Dunlap, Joseph E. Gonzalez, Trevor Darrell, Fabian Caba Heilbron, Josef Sivic, Bryan Russell Discretized Gaussian Representation for Tomographic Reconstruction
Shaokai Wu, Yuxiang Lu, Yapan Guo, Wei Ji, Suizhi Huang, Fengyu Yang, Shalayiding Sirejiding, Qichen He, Jing Tong, Yanbiao Ji, Yue Ding, Hongtao Lu Dissecting Generalized Category Discovery: Multiplex Consensus Under Self-Deconstruction
Luyao Tang, Kunze Huang, Chaoqi Chen, Yuxuan Yuan, Chenxin Li, Xiaotong Tu, Xinghao Ding, Yue Huang DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation
Jiazhe Guo, Yikang Ding, Xiwu Chen, Shuo Chen, Bohan Li, Yingshuang Zou, Xiaoyang Lyu, Feiyang Tan, Xiaojuan Qi, Zhiheng Li, Hao Zhao DISTA-Net: Dynamic Closely-Spaced Infrared Small Target Unmixing
Shengdong Han, Shangdong Yang, Yuxuan Li, Xin Zhang, Xiang Li, Jian Yang, Ming-Ming Cheng, Yimian Dai DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion
Hossein Mirzaei, Zeinab Taghavi, Sepehr Rezaee, Masoud Hadi, Moein Madadi, Mackenzie W. Mathis Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
Shengyuan Zhang, An Zhao, Ling Yang, Zejian Li, Chenye Meng, Haoran Xu, Tianrun Chen, AnYang Wei, Perry Pengyun Gu, Lingyun Sun DisTime: Distribution-Based Time Representation for Video Large Language Models
Yingsen Zeng, Zepeng Huang, Yujie Zhong, Chengjian Feng, Jie Hu, Lin Ma, Yang Liu DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution
Zheng-Peng Duan, Jiawei Zhang, Xin Jin, Ziheng Zhang, Zheng Xiong, Dongqing Zou, Jimmy S. Ren, Chunle Guo, Chongyi Li Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
Zhi Hou, Tianyi Zhang, Yuwen Xiong, Haonan Duan, Hengjun Pu, Ronglei Tong, Chengyang Zhao, Xizhou Zhu, Yu Qiao, Jifeng Dai, Yuntao Chen DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion
Maksim Siniukov, Di Chang, Minh Tran, Hongkun Gong, Ashutosh Chaubey, Mohammad Soleymani DiTFastAttnV2: Head-Wise Attention Compression for Multi-Modality Diffusion Transformers
Hanling Zhang, Rundong Su, Zhihang Yuan, Pengtao Chen, Mingzhu Shen, Yibo Fan, Shengen Yan, Guohao Dai, Yu Wang DIVE: Taming DINO for Subject-Driven Video Editing
Yi Huang, Wei Xiong, He Zhang, Chaoqi Chen, Jianzhuang Liu, Mingfu Yan, Shifeng Chen DLFR-Gen: Diffusion-Based Video Generation with Dynamic Latent Frame Rate
Zhihang Yuan, Rui Xie, Yuzhang Shang, Hanling Zhang, Siyuan Wang, Shengen Yan, Guohao Dai, Yu Wang DMesh++: An Efficient Differentiable Mesh for Complex Shapes
Sanghyun Son, Matheus Gadelha, Yang Zhou, Matthew Fisher, Zexiang Xu, Yi-Ling Qiao, Ming C. Lin, Yi Zhou Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels
Olaf Dünkel, Thomas Wimmer, Christian Theobalt, Christian Rupprecht, Adam Kortylewski DOGR: Towards Versatile Visual Document Grounding and Referring
Yinan Zhou, Yuxin Chen, Haokun Lin, Yichen Wu, Shuyu Yang, Zhongang Qi, Chen Ma, Li Zhu DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization
Zihan Ding, Chi Jin, Difan Liu, Haitian Zheng, Krishna Kumar Singh, Qiang Zhang, Yan Kang, Zhe Lin, Yuchen Liu Domain Generalizable Portrait Style Transfer
Xinbo Wang, Wenju Xu, Qing Zhang, Wei-Shi Zheng Domain-Aware Category-Level Geometry Learning Segmentation for 3D Point Clouds
Pei He, Lingling Li, Licheng Jiao, Ronghua Shang, Fang Liu, Shuang Wang, Xu Liu, Wenping Ma Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection
Subhajit Maity, Ayan Kumar Bhunia, Subhadeep Koley, Pinaki Nath Chowdhury, Aneeshan Sain, Yi-Zhe Song DPoser-X: Diffusion Model as Robust 3D Whole-Body Human Pose Prior
Junzhe Lu, Jing Lin, Hongkun Dou, Ailing Zeng, Yue Deng, Xian Liu, Zhongang Cai, Lei Yang, Yulun Zhang, Haoqian Wang, Ziwei Liu DRaM-LHM: A Quaternion Framework for Iterative Camera Pose Estimation
Chen Lin, Weizhi Du, Zhixiang Min, Baochen She, Enrique Dunn, Sonya M. Hanson Drawing Developmental Trajectory from Cortical Surface Reconstruction
Wenxuan Wu, Ruowen Qu, Zhongliang Liu, Zhuoyan Dai, Dongzi Shi, Sijin Yu, Tong Xiong, Shiping Liu, Xiangmin Xu, Xiaofen Xing, Xin Zhang DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses
Yatian Pang, Bin Zhu, Bin Lin, Mingzhe Zheng, Francis E. H. Tay, Ser-Nam Lim, Harry Yang, Li Yuan DreamFuse: Adaptive Image Fusion with Diffusion Transformer
Junjia Huang, Pengxiang Yan, Jiyang Liu, Jie Wu, Zhao Wang, Yitong Wang, Liang Lin, Guanbin Li DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Model
Junjia Huang, Pengxiang Yan, Jinhang Cai, Jiyang Liu, Zhao Wang, Yitong Wang, Xinglong Wu, Guanbin Li DreamRelation: Relation-Centric Video Customization
Yujie Wei, Shiwei Zhang, Hangjie Yuan, Biao Gong, Longxiang Tang, Xiang Wang, Haonan Qiu, Hengjia Li, Shuai Tan, Yingya Zhang, Hongming Shan DriveArena: A Closed-Loop Generative Simulation Platform for Autonomous Driving
Xuemeng Yang, Licheng Wen, Tiantian Wei, Yukai Ma, Jianbiao Mei, Xin Li, Wenjie Lei, Daocheng Fu, Pinlong Cai, Min Dou, Liang He, Yong Liu, Botian Shi, Yu Qiao DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation
Runze Zhang, Guoguang Du, Xiaochuan Li, Qi Jia, Liang Jin, Lu Liu, Jingjing Wang, Cong Xu, Zhenhua Guo, Yaqian Zhao, Xiaoli Gong, Rengang Li, Baoyu Fan Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
Zhengyao Lv, Chenyang Si, Tianlin Pan, Zhaoxi Chen, Kwan-Yee K. Wong, Yu Qiao, Ziwei Liu Dual-Process Image Generation
Grace Luo, Jonathan Granskog, Aleksander Holynski, Trevor Darrell Dual-Temporal Exemplar Representation Network for Video Semantic Segmentation
Xiaolong Xu, Lei Zhang, Jiayi Li, Lituan Wang, Yifan Guan, Yu Yan, Leyi Zhang, Hao Song DuCos: Duality Constrained Depth Super-Resolution via Foundation Model
Zhiqiang Yan, Zhengxue Wang, Haoye Dong, Jun Li, Jian Yang, Gim Hee Lee DuoLoRA : Cycle-Consistent and Rank-Disentangled Content-Style Personalization
Aniket Roy, Shubhankar Borse, Shreya Kadambi, Debasmit Das, Shweta Mahajan, Risheek Garrepalli, Hyojin Park, Ankita Nayak, Rama Chellappa, Munawar Hayat, Fatih Porikli DWIM: Towards Tool-Aware Visual Reasoning via Discrepancy-Aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke, B G Vijay Kumar, Xingjian Leng, Zhixi Cai, Zaid Khan, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi, Manmohan Chandraker DyGS-SLAM: Real-Time Accurate Localization and Gaussian Reconstruction for Dynamic Scenes
Xinggang Hu, Chenyangguang Zhang, Mingyuan Zhao, Yuanze Gui, Xiangkui Zhang, Xiangyang Ji Dynamic Dictionary Learning for Remote Sensing Image Segmentation
Xuechao Zou, Yue Li, Shun Zhang, Kai Li, Shiying Wang, Pin Tao, Junliang Xing, Congyan Lang Dynamic Multimodal Prototype Learning in Vision-Language Models
Xingyu Zhu, Shuo Wang, Beier Zhu, Miaoge Li, Yunfan Li, Junfeng Fang, Zhicai Wang, Dongsheng Wang, Hanwang Zhang Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
Zichen Liu, Yihao Meng, Hao Ouyang, Yue Yu, Bolin Zhao, Daniel Cohen-Or, Huamin Qu Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
Han Wang, Yuxiang Nie, Yongjie Ye, Yanjie Wang, Shuai Li, Haiyang Yu, Jinghui Lu, Can Huang E-SAM: Training-Free Segment Every Entity Model
Weiming Zhang, Dingwen Xiao, Lei Chen, Lin Wang EA-KD: Entropy-Based Adaptive Knowledge Distillation
Chi-Ping Su, Ching-Hsun Tseng, Bin Pu, Lei Zhao, Jiewen Yang, Zhuangzhuang Chen, Shin-Jye Lee EA-ViT: Efficient Adaptation for Elastic Vision Transformer
Chen Zhu, Wangbo Zhao, Huiwen Zhang, Yuhao Zhou, Weidong Tang, Shuo Wang, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Kai Wang, Dawei Yang Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing
Joowon Kim, Ziseok Lee, Donghyeon Cho, Sanghyun Jo, Yeonsung Jung, Kyungsu Kim, Eunho Yang EDFFDNet: Towards Accurate and Efficient Unsupervised Multi-Grid Image Registration
Haokai Zhu, Bo Qu, Si-Yuan Cao, Runmin Zhang, Shujie Chen, Bailin Yang, Hui-Liang Shen Edicho: Consistent Image Editing in the Wild
Qingyan Bai, Hao Ouyang, Yinghao Xu, Qiuyu Wang, Ceyuan Yang, Ka Leong Cheng, Yujun Shen, Qifeng Chen EDiT: Efficient Diffusion Transformers with Linear Compressed Attention
Philipp Becker, Abhinav Mehrotra, Ruchika Chavhan, Malcolm Chadwick, Luca Morreale, Mehdi Noroozi, Alberto Gil C. P. Ramos, Sourav Bhattacharya Edit360: 2D Image Edits to 3D Assets from Any Angle
Junchao Huang, Xinting Hu, Shaoshuai Shi, Zhuotao Tian, Li Jiang EditCLIP: Representation Learning for Image Editing
Qian Wang, Aleksandar Cvejić, Abdelrahman Eldesokey, Peter Wonka Effective Training Data Synthesis for Improving MLLM Chart Understanding
Yuwei Yang, Zeyu Zhang, Yunzhong Hou, Zhuowan Li, Gaowen Liu, Ali Payani, Yuan-Sen Ting, Liang Zheng Efficient Adaptation of Pre-Trained Vision Transformer Underpinned by Approximately Orthogonal Fine-Tuning Strategy
Yiting Yang, Hao Luo, Yuan Sun, Qingsen Yan, Haokui Zhang, Wei Dong, Guoqing Wang, Peng Wang, Yang Yang, Hengtao Shen Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization
Kangle Deng, Hsueh-Ti Derek Liu, Yiheng Zhu, Xiaoxia Sun, Chong Shang, Kiran S. Bhat, Deva Ramanan, Jun-Yan Zhu, Maneesh Agrawala, Tinghui Zhou Efficient Concertormer for Image Deblurring and Beyond
Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang Efficient Event Camera Data Pretraining with Adaptive Prompt Fusion
Quanmin Liang, Qiang Li, Shuai Liu, Xinzi Cao, Jinyi Lu, Feidiao Yang, Wei Zhang, Kai Huang, Yonghong Tian Efficient Fine-Tuning of Large Models via Nested Low-Rank Adaptation
Lujun Li, Cheng Lin, Dezhi Li, You-Liang Huang, Wei Li, Tianyu Wu, Jie Zou, Wei Xue, Sirui Han, Yike Guo Efficient Input-Level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation
Shengfang Zhai, Jiajun Li, Yue Liu, Huanran Chen, Zhihua Tian, Wenjie Qu, Qingni Shen, Ruoxi Jia, Yinpeng Dong, Jiaheng Zhang Efficient Spiking Point Mamba for Point Cloud Analysis
Peixi Wu, Bosong Chai, Menghua Zheng, Wei Li, Zhangchi Hu, Jie Chen, Zheyu Zhang, Hebei Li, Xiaoyan Sun Efficient Track Anything
Yunyang Xiong, Chong Zhou, Xiaoyu Xiang, Lemeng Wu, Chenchen Zhu, Zechun Liu, Saksham Suri, Balakrishnan Varadarajan, Ramya Akula, Forrest Iandola, Raghuraman Krishnamoorthi, Bilge Soran, Vikas Chandra Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers
Lukas Kuhn, Sari Sadiya, Jörg Schlötterer, Florian Buettner, Christin Seifert, Gemma Roig EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception
Sanjoy Chowdhury, Subrata Biswas, Sayan Nag, Tushar Nagarajan, Calvin Murdock, Ishwarya Ananthabhotla, Yijun Qian, Vamsi Krishna Ithapu, Dinesh Manocha, Ruohan Gao EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds
Lu Chen, Yizhou Wang, Shixiang Tang, Qianhong Ma, Tong He, Wanli Ouyang, Xiaowei Zhou, Hujun Bao, Sida Peng EgoM2P: Egocentric Multimodal Multitask Pretraining
Gen Li, Yutong Chen, Yiqian Wu, Kaifeng Zhao, Marc Pollefeys, Siyu Tang EgoMusic-Driven Human Dance Motion Estimation with Skeleton Mamba
Quang Nguyen, Nhat Le, Baoru Huang, Minh Nhat Vu, Chengcheng Tang, Van Nguyen, Ngan Le, Thieu Vo, Anh Nguyen Embodied Representation Alignment with Mirror Neurons
Wentao Zhu, Zhining Zhang, Yuwei Ren, Yin Huang, Hao Xu, Yizhou Wang EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting
Xiaobao Wei, Qingpo Wuwu, Zhongyu Zhao, Zhuangzhe Wu, Nan Huang, Ming Lu, Ningning Ma, Shanghang Zhang End-to-End Multi-Modal Diffusion Mamba
Chunhao Lu, Qiang Lu, Meichen Dong, Jake Luo Engage for All: Making Ordinary Image Descriptions Appealing Again!
Yuyan Chen, Yifan Jiang, Li Zhou, Jinghan Cao, Yu Guan, Ming Yang, Qingpei Guo Enhanced Event-Based Dense Stereo via Cross-Sensor Knowledge Distillation
Haihao Zhang, Yunjian Zhang, Jianing Li, Lin Zhu, Meng Lv, Yao Zhu, Yanwei Liu, Xiangyang Ji Enhanced Pansharpening via Quaternion Spatial-Spectral Interactions
Dong Li, Chunhui Luo, Yuanfei Bao, Gang Yang, Jie Xiao, Xueyang Fu, Zheng-Jun Zha Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
Chancharik Mitra, Brandon Huang, Tianning Chai, Zhiqiu Lin, Assaf Arbelle, Rogerio Feris, Leonid Karlinsky, Trevor Darrell, Deva Ramanan, Roei Herzig Enhancing Numerical Prediction of MLLMs with Soft Labeling
Pei Wang, Zhaowei Cai, Hao Yang, Davide Modolo, Ashwin Swaminathan Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
Jun Li, Jinpeng Wang, Chaolei Tan, Niu Lian, Long Chen, Yaowei Wang, Min Zhang, Shu-Tao Xia, Bin Chen Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick, Effrosyni Mavroudi, Yale Song, Rama Chellappa, Lorenzo Torresani, Triantafyllos Afouras Entropy-Adaptive Diffusion Policy Optimization with Dynamic Step Alignment
RenYe Yan, Jikang Cheng, Yaozhong Gan, Shikun Sun, You Wu, Yunfan Yang, Liang Ling, Jinlong Lin, Yeshuang Zhu, Jie Zhou, Jinchao Zhang, Junliang Xing, Yimao Cai, Ru Huang Epona: Autoregressive Diffusion World Model for Autonomous Driving
Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu, Xingang Pan, Xiaoyang Guo, Yuan Liu, Jingwei Huang, Li Yuan, Qian Zhang, Xiao-Xiao Long, Xun Cao, Wei Yin EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks
Athinoulla Konstantinou, Georgios Leontidis, Mamatha Thota, Aiden Durrant Erasing More than Intended? How Concept Erasure Degrades the Generation of Non-Target Concepts
Ibtihel Amara, Ahmed Imtiaz Humayun, Ivana Kajic, Zarana Parekh, Natalie Harris, Sarah Young, Chirag Nagpal, Najoung Kim, Junfeng He, Cristina Nader Vasconcelos, Deepak Ramachandran, Golnoosh Farnadi, Katherine Heller, Mohammad Havaei, Negar Rostamzadeh Estimating 2D Camera Motion with Hybrid Motion Basis
Haipeng Li, Tianhao Zhou, Zhanglei Yang, Yi Wu, Yan Chen, Zijing Mao, Shen Cheng, Bing Zeng, Shuaicheng Liu ETA: Energy-Based Test-Time Adaptation for Depth Completion
Younjoon Chung, Hyoungseob Park, Patrick Rim, Xiaoran Zhang, Jihe He, Ziyao Zeng, Safa Cicek, Byung-Woo Hong, James S. Duncan, Alex Wong ETVA: Evaluation of Text-to-Video Alignment via Fine-Grained Question Generation and Answering
Kaisi Guan, Zhengfeng Lai, Yuchong Sun, Peng Zhang, Wei Liu, Kieran Liu, Meng Cao, Ruihua Song Evading Data Provenance in Deep Neural Networks
Hongyu Zhu, Sichu Liang, Wenwen Wang, Zhuomeng Zhang, Fangqi Li, Shi-Lin Wang EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images
Wangbo Yu, Chaoran Feng, Jianing Li, Jiye Tang, Jiashu Yang, Zhenyu Tang, Meng Cao, Xu Jia, Yuchao Yang, Li Yuan, Yonghong Tian EVDM: Event-Based Real-World Video Deblurring with Mamba
Zhijing Sun, Senyan Xu, Kean Liu, Runze Tian, Xueyang Fu, Zheng-Jun Zha Event-Based Visual Vibrometry
Xinyu Zhou, Peiqi Duan, Yeliduosi Xiaokaiti, Chao Xu, Boxin Shi Event-Driven Storytelling with Multiple Lifelike Humans in a 3D Scene
Donggeun Lim, Jinseok Bae, Inwoo Hwang, Seungmin Lee, Hwanhee Lee, Young Min Kim Event-Guided HDR Reconstruction with Diffusion Priors
Yixin Yang, Jiawei Zhang, Yang Zhang, Yunxuan Wei, Dongqing Zou, Jimmy S. Ren, Boxin Shi EventUPS: Uncalibrated Photometric Stereo Using an Event Camera
Jinxiu Liang, Bohan Yu, Siqi Yang, Haotian Zhuang, Jieji Ren, Peiqi Duan, Boxin Shi EVER: Exact Volumetric Ellipsoid Rendering for Real-Time View Synthesis
Alexander Mai, Peter Hedman, George Kopanas, Dor Verbin, David Futschik, Qiangeng Xu, Falko Kuester, Jonathan T. Barron, Yinda Zhang Everything Is a Video: Unifying Modalities Through Next-Frame Prediction
G. Thomas Hudson, Dean Slack, Thomas Winterbottom, Jamie Sterling, Chenghao Xiao, Junjie Shentu, Noura Al Moubayed EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Haiwen Diao, Xiaotong Li, Yufeng Cui, Yueze Wang, Haoge Deng, Ting Pan, Wenxuan Wang, Huchuan Lu, Xinlong Wang Evidential Knowledge Distillation
Liangyu Xiang, Junyu Gao, Changsheng Xu EvolvingGrasp: Evolutionary Grasp Generation via Efficient Preference Alignment
Yufei Zhu, Yiming Zhong, Zemin Yang, Peishan Cong, Jingyi Yu, Xinge Zhu, Yuexin Ma Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics
Taowen Wang, Cheng Han, James Liang, Wenhao Yang, Dongfang Liu, Luna Xinyu Zhang, Qifan Wang, Jiebo Luo, Ruixiang Tang Expressive Talking Human from Single-Image with Imperfect Priors
Jun Xiang, Yudong Guo, Leipeng Hu, Boyang Guo, Yancheng Yuan, Juyong Zhang External Knowledge Injection for CLIP-Based Class-Incremental Learning
Da-Wei Zhou, Kai-Wen Li, Jingyi Ning, Han-Jia Ye, Lijun Zhang, De-Chuan Zhan Extrapolated Urban View Synthesis Benchmark
Xiangyu Han, Zhen Jia, Boyi Li, Yan Wang, Boris Ivanovic, Yurong You, Lingjie Liu, Yue Wang, Marco Pavone, Chen Feng, Yiming Li EYE3:Turn Anything into Naked-Eye 3D
Yingde Song, Zongyuan Yang, Baolin Liu, Yongping Xiong, Sai Chen, Lan Yi, Zhaohe Zhang, Xunbo Yu F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration
Lu Liu, Huiyu Duan, Qiang Hu, Liu Yang, Chunlei Cai, Tianxiao Ye, Huayu Liu, Xiaoyun Zhang, Guangtao Zhai FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image
Fei Yin, B R Mallikarjun, Chun-Han Yao, Rafal K. Mantiuk, Varun Jampani FaceShield: Defending Facial Image Against Deepfake Threats
Jaehwan Jeong, Sumin In, Sieun Kim, Hannie Shin, Jongheon Jeong, Sang Ho Yoon, Jaewook Chung, Sangpil Kim FaceXFormer: A Unified Transformer for Facial Analysis
Kartik Narayan, Vibashan Vs, Rama Chellappa, Vishal M. Patel FakeRadar: Probing Forgery Outliers to Detect Unknown Deepfake Videos
Zhaolun Li, Jichang Li, Yinqi Cai, Junye Chen, Xiaonan Luo, Guanbin Li, Rushi Lan Fast Image Super-Resolution via Consistency Rectified Flow
Jiaqi Xu, Wenbo Li, Haoze Sun, Fan Li, Zhixin Wang, Long Peng, Jingjing Ren, Haoran Yang, Xiaowei Hu, Renjing Pei, Pheng-Ann Heng Faster and Better 3D Splatting via Group Training
Chengbo Wang, Guozheng Ma, Yifei Xue, Yizhen Lao FastJSMA: Accelerating Jacobian-Based Saliency mAP Attacks Through Gradient Decoupling
Zhenghao Gao, Shengjie Xu, Zijing Li, Meixi Chen, Chaojian Yu, Yuanjie Shao, Changxin Gao FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo, Yawei Li, Taolin Zhang, Jiangshan Wang, Tao Dai, Shu-Tao Xia, Luca Benini FDPT: Federated Discrete Prompt Tuning for Black-Box Visual-Language Models
Jiaqi Wu, Simin Chen, Jing Tang, Yuzhe Yang, Yiming Chen, Lixu Wang, Song Lin, Zehua Wang, Wei Chen, Zijian Tian Federated Continual Instruction Tuning
Haiyang Guo, Fanhu Zeng, Fei Zhu, Wenzhuo Liu, Da-Han Wang, Jian Xu, Xu-Yao Zhang, Cheng-Lin Liu Federated Continuous Category Discovery and Learning
Lixu Wang, Chenxi Liu, Junfeng Guo, Qingqing Ye, Heng Huang, Haibo Hu, Wei Dong Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data
Thu Hang Phung, Duong M. Nguyen, Thanh Trung Huynh, Quoc Viet Hung Nguyen, Trong Nghia Hoang, Phi Le Nguyen Federated Representation Angle Learning
Liping Yi, Han Yu, Gang Wang, Xiaoguang Liu, Xiaoxiao Li FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models
Mainak Singha, Subhankar Roy, Sarthak Mehrotra, Ankit Jha, Moloud Abdar, Biplab Banerjee, Elisa Ricci Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion
Aleksandar Jevtić, Christoph Reich, Felix Wimbauer, Oliver Hahn, Christian Rupprecht, Stefan Roth, Daniel Cremers FEVER-OOD: Free Energy Vulnerability Elimination for Robust Out-of-Distribution Detection
Brian K.S. Isaac-Medina, Mauricio Che, Yona Falinie A. Gaus, Samet Akcay, Toby P. Breckon Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models
Xudong Li, Zihao Huang, Yan Zhang, Yunhang Shen, Ke Li, Xiawu Zheng, Liujuan Cao, Rongrong Ji Few-Shot Pattern Detection via Template Matching and Regression
Eunchan Jo, Dahyun Kang, Sanghyun Kim, Yunseon Choi, Minsu Cho Find Any Part in 3D
Ziqi Ma, Yisong Yue, Georgia Gkioxari FIND: Few-Shot Anomaly Inspection with Normal-Only Multi-Modal Data
Yiting Li, Fayao Liu, Jingyi Liao, Sichao Tian, Chuan-Sheng Foo, Xulei Yang Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
Yue Li, Meng Tian, Zhenyu Lin, Jiangtong Zhu, Dechang Zhu, Haiqiang Liu, Yueyi Zhang, Zhiwei Xiong, Xinhai Zhao Fine-Grained Spatiotemporal Grounding on Egocentric Videos
Shuo Liang, Yiwu Zhong, Zi-Yuan Hu, Yeyao Tao, Liwei Wang Fine-Tuning Visual Autogressive Models for Subject-Driven Generation
Jiwoo Chung, Sangeek Hyun, Hyunjun Kim, Eunseo Koh, MinKyu Lee, Jae-Pil Heo FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging
Zichen Tang, Haihong E, Jiacheng Liu, Zhongjun Yang, Rongjin Li, Zihua Rong, Haoyang He, Zhuodi Hao, Xinyang Hu, Kun Ji, Ziyan Ma, Mengyuan Ji, Jun Zhang, Chenghao Ma, Qianhe Zheng, Yang Liu, Yiling Huang, Xinyi Hu, Qing Huang, Zijian Xie, Shiyao Peng Fish2Mesh Transformer: 3D Human Mesh Recovery from Egocentric Vision
Tianma Shen, Aditya Puranik, James Vong, Vrushabh Deogirikar, Ryan Fell, Julianna Dietrich, Maria Kyrarini, Christopher Kitts, David C. Jeong Flash-VStream: Efficient Real-Time Understanding for Long Video Streams
Haoji Zhang, Yiqin Wang, Yansong Tang, Yong Liu, Jiashi Feng, Xiaojie Jin FlashDepth: Real-Time Streaming Video Depth Estimation at 2k Resolution
Gene Chou, Wenqi Xian, Guandao Yang, Mohamed Abdelfattah, Bharath Hariharan, Noah Snavely, Ning Yu, Paul Debevec FlexGen: Flexible Multi-View Generation from Text and Image Inputs
Xinli Xu, Wenhang Ge, Jiantao Lin, Jiawei Feng, Lie Xu, Hanfeng Zhao, Shunsi Zhang, Ying-Cong Chen FLOSS: Free Lunch in Open-Vocabulary Semantic Segmentation
Yasser Benigmim, Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Raoul de Charette Flow Stochastic Segmentation Networks
Fabio De Sousa Ribeiro, Omar Todd, Charles Jones, Avinash Kori, Raghav Mehta, Ben Glocker FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models
Vladimir Kulikov, Matan Kleiner, Inbar Huberman-Spiegelglas, Tomer Michaeli FlowR: Flowing from Sparse to Dense 3D Reconstructions
Tobias Fischer, Samuel Rota Bulò, Yung-Hsu Yang, Nikhil Keetha, Lorenzo Porzi, Norman Müller, Katja Schwarz, Jonathon Luiten, Marc Pollefeys, Peter Kontschieder FOLDER: Accelerating Multi-Modal Large Language Models with Enhanced Performance
Haicheng Wang, Zhemeng Yu, Gabriele Spadaro, Chen Ju, Victor Quétu, Shuai Xiao, Enzo Tartaglione FonTS: Text Rendering with Typography and Style Controls
Wenda Shi, Yiren Song, Dengming Zhang, Jiaming Liu, Xingxing Zou ForCenNet: Foreground-Centric Network for Document Image Rectification
Peng Cai, Qiang Li, Kaicheng Yang, Dong Guo, Jia Li, Nan Zhou, Xiang An, Ninghua Yang, Jiankang Deng ForestFormer3D: A Unified Framework for End-to-End Segmentation of Forest LiDAR 3D Point Clouds
Binbin Xiang, Maciej Wielgosz, Stefano Puliti, Kamil Král, Martin Krůček, Azim Missarov, Rasmus Astrup FPEM: Face Prior Enhanced Facial Attractiveness Prediction for Live Videos with Face Retouching
Hui Li, Xiaoyu Ren, Hongjiu Yu, Ying Chen, Kai Li, L Wang, Xiongkuo Min, Huiyu Duan, Guangtao Zhai, Xu Liu Free-MoRef: Instantly Multiplexing Context Perception Capabilities of Video-MLLMs Within Single Inference
Kuo Wang, Quanlong Zheng, Junlin Xie, Yanhao Zhang, Jinguo Luo, Haonan Lu, Liang Lin, Fan Zhou, Guanbin Li Free-Running vs Synchronous: Single-Photon LiDAR for High-Flux 3D Imaging
Ruangrawee Kitichotkul, Shashwath Bharadwaj, Joshua Rapp, Yanting Ma, Alexander Mehta, Vivek K Goyal Free4D: Tuning-Free 4D Scene Generation with Spatial-Temporal Consistency
Tianqi Liu, Zihao Huang, Zhaoxi Chen, Guangcong Wang, Shoukang Hu, Liao Shen, Huiqiang Sun, Zhiguo Cao, Wei Li, Ziwei Liu FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
Haonan Qiu, Shiwei Zhang, Yujie Wei, Ruihang Chu, Hangjie Yuan, Xiang Wang, Yingya Zhang, Ziwei Liu Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting
Yuqi Li, Chuanguang Yang, Hansheng Zeng, Zeyu Dong, Zhulin An, Yongjun Xu, Yingli Tian, Hao Wu From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning
Hang Du, Jiayang Zhang, Guoshun Nan, Wendi Deng, Zhenyan Chen, Chenyang Zhang, Wang Xiao, Shan Huang, Yuqi Pan, Tao Qi, Sicong Leng From Enhancement to Understanding: Build a Generalized Bridge for Low-Light Vision via Semantically Consistent Unsupervised Fine-Tuning
Sen Wang, Shao Zeng, Tianjun Gu, Zhizhong Zhang, Ruixin Zhang, Shouhong Ding, Jingyun Zhang, Jun Wang, Xin Tan, Yuan Xie, Lizhuang Ma From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos
Chenjian Gao, Lihe Ding, Rui Han, Zhanpeng Huang, Zibin Wang, Tianfan Xue From Image to Video: An Empirical Study of Diffusion Representations
Pedro Vélez, Luisa F. Polanía, Yi Yang, Chuhan Zhang, Rishabh Kabra, Anurag Arnab, Mehdi S. M. Sajjadi From Imitation to Innovation: The Emergence of AI's Unique Artistic Styles and the Challenge of Copyright Protection
Zexi Jia, Chuanwei Huang, Yeshuang Zhu, Hongyan Fei, Ying Deng, Zhiqiang Yuan, Jiapei Zhang, Jinchao Zhang, Jie Zhou From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-Guided Symbolic Reasoning
Yuhui Zeng, Haoxiang Wu, Wenjie Nie, Guangyao Chen, Xiawu Zheng, Yunhang Shen, Jun Peng, Yonghong Tian, Rongrong Ji From One to More: Contextual Part Latents for 3D Generation
Shaocong Dong, Lihe Ding, Xiao Chen, Yaokun Li, Yuxin Wang, Yucheng Wang, Qi Wang, Jaehyeok Kim, Chenjian Gao, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu FullDiT: Video Generative Foundation Models with Multimodal Control via Full Attention
Xuan Ju, Weicai Ye, Quande Liu, Qiulin Wang, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Qiang Xu Fuse Before Transfer: Knowledge Fusion for Heterogeneous Distillation
Guopeng Li, Qiang Wang, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia Future-Aware Interaction Network for Motion Forecasting
Shijie Li, Chunyu Liu, Xun Xu, Si Yong Yeo, Xulei Yang FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
Hao Mark Chen, Shell Xu Hu, Wayne Luk, Timothy Hospedales, Hongxiang Fan Gain-MLP: Improving HDR Gain mAP Encoding via a Lightweight MLP
Trevor D. Canham, SaiKiran Tedla, Michael J. Murdoch, Michael S. Brown Gait-X: Exploring X Modality for Generalized Gait Recognition
Zengbin Wang, Saihui Hou, Junjie Li, Xu Liu, Chunshui Cao, Yongzhen Huang, Siye Wang, Man Zhang GameFactory: Creating New Games with Generative Interactive Videos
Jiwen Yu, Yiran Qin, Xintao Wang, Pengfei Wan, Di Zhang, Xihui Liu GAP: Gaussianize Any Point Clouds with Text Guidance
Weiqi Zhang, Junsheng Zhou, Haotian Geng, Wenyuan Zhang, Yu-Shen Liu GaRe: Relightable 3D Gaussian Splatting for Outdoor Scenes from Unconstrained Photo Collections
Haiyang Bai, Jiaqi Zhu, Songru Jiang, Wei Huang, Tao Lu, Yuanqi Li, Jie Guo, Runze Fu, Yanwen Guo, Lijun Chen GARF: Learning Generalizable 3D Reassembly for Real-World Fractures
Sihang Li, Zeyu Jiang, Grace Chen, Chenyang Xu, Siqi Tan, Xue Wang, Irving Fang, Kristof Zyskowski, Shannon P. McPherron, Radu Iovita, Chen Feng, Jing Zhang GAS: Generative Avatar Synthesis from a Single Image
Yixing Lu, Junting Dong, Youngjoong Kwon, Qin Zhao, Bo Dai, Fernando De la Torre GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR
Christophe Bolduc, Yannick Hold-Geoffroy, Jean-François Lalonde Gaussian Variation Field Diffusion for High-Fidelity Video-to-4D Synthesis
Bowen Zhang, Sicheng Xu, Chuxin Wang, Jiaolong Yang, Feng Zhao, Dong Chen, Baining Guo GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
Xinli Xu, Wenhang Ge, Dicong Qiu, ZhiFei Chen, Dongyu Yan, Zhuoyun Liu, Haoyu Zhao, Hanfeng Zhao, Shunsi Zhang, Junwei Liang, Ying-Cong Chen GaussianSpeech: Audio-Driven Personalized 3D Gaussian Avatars
Shivangi Aneja, Artem Sevastopolsky, Tobias Kirschstein, Justus Thies, Angela Dai, Matthias Nießner GaussianUpdate: Continual 3D Gaussian Splatting Update for Changing Environments
Lin Zeng, Boming Zhao, Jiarui Hu, Xujie Shen, Ziqiang Dang, Hujun Bao, Zhaopeng Cui GaussRender: Learning 3D Occupancy with Gaussian Rendering
Loick Chambon, Eloi Zablocki, Alexandre Boulch, Mickael Chen, Matthieu Cord GauUpdate: New Object Insertion in 3D Gaussian Fields with Consistent Global Illumination
Chengwei Ren, Fan Zhang, Liangchao Xu, Liang Pan, Ziwei Liu, Wenping Wang, Xiao-Ping Zhang, Yuan Liu Gaze-Language Alignment for Zero-Shot Prediction of Visual Search Targets from Human Gaze Scanpaths
Sounak Mondal, Naveen Sendhilnathan, Ting Zhang, Yue Liu, Michael Proulx, Michael Louis Iuzzolino, Chuan Qin, Tanya R. Jonker GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion
Li-Heng Chen, Zi-Xin Zou, Chang Liu, Tianjiao Jing, Yan-Pei Cao, Shi-Sheng Huang, Hongbo Fu, Hua Huang GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology
Saarthak Kapse, Pushpak Pati, Srikar Yellapragada, Srijan Das, Rajarsi R. Gupta, Joel Saltz, Dimitris Samaras, Prateek Prasanna GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-Ray Diagnosis
Bo Liu, Ke Zou, Li-Ming Zhan, Zexin Lu, Xiaoyu Dong, Yidi Chen, Chengqiang Xie, Jiannong Cao, Xiao-Ming Wu, Huazhu Fu General Compression Framework for Efficient Transformer Object Tracking
Lingyi Hong, Jinglun Li, Xinyu Zhou, Shilin Yan, Pinxue Guo, Kaixun Jiang, Zhaoyu Chen, Shuyong Gao, Runze Li, Xingdong Sheng, Wei Zhang, Hong Lu, Wenqiang Zhang Generate, Transduct, Adapt: Iterative Transduction with VLMs
Oindrila Saha, Logan Lawrence, Grant Van Horn, Subhransu Maji Generating Physically Stable and Buildable Brick Structures from Text
Ava Pun, Kangle Deng, Ruixuan Liu, Deva Ramanan, Changliu Liu, Jun-Yan Zhu Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks
Bhishma Dedhia, David Bourgin, Krishna Kumar Singh, Yuheng Li, Yan Kang, Zhan Xu, Niraj K. Jha, Yuchen Liu Generative Adversarial Diffusion
U-Chae Jun, Jaeeun Ko, Jiwoo Kang Generative Modeling of Shape-Dependent Self-Contact Human Poses
Takehiko Ohkawa, Jihyun Lee, Shunsuke Saito, Jason Saragih, Fabian Prada, Yichen Xu, Shoou-I Yu, Ryosuke Furuta, Yoichi Sato, Takaaki Shiratori Generative Video Bi-Flow
Chen Liu, Tobias Ritschel Generative Zoo
Tomasz Niewiadomski, Anastasios Yiannakidis, Hanz Cuevas-Velasquez, Soubhik Sanyal, Michael J. Black, Silvia Zuffi, Peter Kulits GenHaze: Pioneering Controllable One-Step Realistic Haze Generation for Real-World Dehazing
Sixiang Chen, Tian Ye, Yunlong Lin, Yeying Jin, Yijun Yang, Haoyu Chen, Jianyu Lai, Song Fei, Zhaohu Xing, Fugee Tsung, Lei Zhu GenieBlue: Integrating Both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong Lu, Yinghao Chen, Renshou Wu, Haohao Gao, Xi Chen, Xue Yang, Xiangyu Zhao, Aojun Zhou, Fangyuan Li, Yafei Wen, Xiaoxin Chen, Shuai Ren, Hongsheng Li GENMO: A GENeralist Model for Human MOtion
Jiefeng Li, Jinkun Cao, Haotian Zhang, Davis Rempe, Jan Kautz, Umar Iqbal, Ye Yuan GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar
SeungJun Moon, Hah Min Lew, Seungeun Lee, Ji-Su Kang, Gyeong-Moon Park GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
Muhammad Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Kartik Kuckreja, Fahad Shahbaz Khan, Paolo Fraccaro, Alexandre Lacoste, Salman Khan GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation
Phillip Mueller, Talip Uenlue, Sebastian Schmidt, Marcel Kollovieh, Jiajie Fan, Stephan Günnemann, Lars Mikelsons GeoMan: Temporally Consistent Human Geometry Estimation Using Image-to-Video Diffusion
Gwanghyun Kim, Xueting Li, Ye Yuan, Koki Nagano, Tianye Li, Jan Kautz, Se Young Chun, Umar Iqbal Geometry Distributions
Biao Zhang, Jing Ren, Peter Wonka GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes
Pradyumn Goyal, Dmitry Petrov, Sheldon Andrews, Yizhak Ben-Shabat, Hsueh-Ti Derek Liu, Evangelos Kalogerakis GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields
Shunsuke Yasuki, Taiki Miyanishi, Nakamasa Inoue, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Masato Taki, Yutaka Matsuo GestureHYDRA: Semantic Co-Speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation
Quanwei Yang, Luying Huang, Kaisiyuan Wang, Jiazhi Guan, Shengyi He, Fengguo Li, Hang Zhou, Lingyun Yu, Yingying Li, Haocheng Feng, Hongtao Xie GFPack++: Attention-Driven Gradient Fields for Optimizing 2D Irregular Packing
Tianyang Xue, Lin Lu, Yang Liu, Mingdong Wu, Hao Dong, Yanbin Zhang, Renmin Han, Baoquan Chen GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation
Wentao Hu, Shunkai Li, Ziqiao Peng, Haoxian Zhang, Fan Shi, Xiaoqiang Liu, Pengfei Wan, Di Zhang, Hui Tian GIViC: Generative Implicit Video Compression
Ge Gao, Siyue Teng, Tianhao Peng, Fan Zhang, David Bull GlassWizard: Harvesting Diffusion Priors for Glass Surface Detection
Wenxue Li, Tian Ye, Xinyu Xiong, Jinbin Bai, Feilong Tang, Wenxuan Song, Zhaohu Xing, Lie Ju, Guanbin Li, Lei Zhu Global and Local Entailment Learning for Natural World Imagery
Srikumar Sastry, Aayush Dhakal, Eric Xing, Subash Khanal, Nathan Jacobs Global Motion Corresponder for 3D Point-Based Scene Interpolation Under Large Motion
Junru Lin, Chirag Vashist, Mikaela Angelina Uy, Colton Stearns, Xuan Luo, Leonidas Guibas, Ke Li Global Regulation and Excitation via Attention Tuning for Stereo Matching
Jiahao Li, Xinhong Chen, Zhengmin Jiang, Qian Zhou, Yung-Hui Li, Jianping Wang Global-Aware Monocular Semantic Scene Completion with State Space Models
Shijie Li, Zhongyao Cheng, Rong Li, Shuai Li, Juergen Gall, Xun Xu, Xulei Yang GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts
Minwen Liao, Haobo Dong, Xinyi Wang, Kurban Ubul, Yihua Shao, Ziyang Yan Go to Zero: Towards Zero-Shot Motion Generation with Million-Scale Data
Ke Fan, Shunlin Lu, Minyue Dai, Runyi Yu, Lixing Xiao, Zhiyang Dou, Junting Dong, Lizhuang Ma, Jingbo Wang Golden Noise for Diffusion Models: A Learning Framework
Zikai Zhou, Shitong Shao, Lichen Bai, Shufei Zhang, Zhiqiang Xu, Bo Han, Zeke Xie Gradient Decomposition and Alignment for Incremental Object Detection
Wenlong Luo, Shizhou Zhang, De Cheng, Yinghui Xing, Guoqiang Liang, Peng Wang, Yanning Zhang Gradient-Reweighted Adversarial Camouflage for Physical Object Detection Evasion
Jiawei Liang, Siyuan Liang, Tianrui Lou, Ming Zhang, Wenjin Li, Dunqiu Fan, Xiaochun Cao GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
Rui Hu, Lianghui Zhu, Yuxuan Zhang, Tianheng Cheng, Lei Liu, Heng Liu, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang Growing a Twig to Accelerate Large Vision-Language Models
Zhenwei Shao, Mingyang Wang, Zhou Yu, Wenwen Pan, Yan Yang, Tao Wei, Hongyuan Zhang, Ning Mao, Wei Chen, Jun Yu GS-Occ3D: Scaling Vision-Only Occupancy Reconstruction with Gaussian Splatting
Baijun Ye, Minghui Qin, Saining Zhang, Moonjun Gong, Shaoting Zhu, Hao Zhao, Hang Zhao GSOT3D: Towards Generic 3D Single Object Tracking in the Wild
Yifan Jiao, Yunhao Li, Junhua Ding, Qing Yang, Song Fu, Heng Fan, Libo Zhang GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space
David G. Shatwell, Ishan Rajendrakumar Dave, Sirnam Swetha, Mubarak Shah GUAVA: Generalizable Upper Body 3D Gaussian Avatar
Dongbin Zhang, Yunfei Liu, Lijian Lin, Ye Zhu, Yang Li, Minghan Qin, Yu Li, Haoqian Wang GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Quanfeng Lu, Wenqi Shao, Zitao Liu, Lingxiao Du, Fanqing Meng, Boxuan Li, Botong Chen, Siyuan Huang, Kaipeng Zhang, Ping Luo GWM: Towards Scalable Gaussian World Models for Robotic Manipulation
Guanxing Lu, Baoxiong Jia, Puhao Li, Yixin Chen, Ziwei Wang, Yansong Tang, Siyuan Huang HADES: Human Avatar with Dynamic Explicit Hair Strands
Zhanfeng Liao, Hanzhang Tu, Cheng Peng, Hongwen Zhang, Boyao Zhou, Yebin Liu HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars
Byungjun Kim, Shunsuke Saito, Giljoo Nam, Tomas Simon, Jason Saragih, Hanbyul Joo, Junxuan Li HAMSt3R: Human-Aware Multi-View Stereo 3D Reconstruction
Sara Rojas, Matthieu Armando, Bernard Ghanem, Philippe Weinzaepfel, Vincent Leroy, Grégory Rogez Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Zhonghua Wu, Qingyi Tao, Wentao Liu, Wei Li, Chen Change Loy Harnessing Input-Adaptive Inference for Efficient VLN
Dongwoo Kang, Akhil Perincherry, Zachary Coalson, Aiden Gabriel, Stefan Lee, Sanghyun Hong Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling
Fengxiang Wang, Hongzhen Wang, Di Wang, Zonghao Guo, Zhenyu Zhong, Long Lan, Wenjing Yang, Jing Zhang Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions
Yiting Qu, Ziqing Yang, Yihan Ma, Michael Backes, Savvas Zannettou, Yang Zhang HDR Image Generation via Gain mAP Decomposed Diffusion
Yuanshen Guan, Ruikang Xu, Yinuo Liao, Mingde Yao, Lizhi Wang, Zhiwei Xiong HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Xin Zhou, Dingkang Liang, Sifan Tu, Xiwu Chen, Yikang Ding, Dingyuan Zhang, Feiyang Tan, Hengshuang Zhao, Xiang Bai HERMES: Temporal-coHERent Long-forM Understanding with Episodes and Semantics
Gueter Josmy Faure, Jia-Fong Yeh, Min-Hung Chen, Hung-Ting Su, Shang-Hong Lai, Winston H. Hsu HERO: Human Reaction Generation from Videos
Chengjun Yu, Wei Zhai, Yuhang Yang, Yang Cao, Zheng-Jun Zha Hi3DGen: High-Fidelity 3D Geometry Generation from Images via Normal Bridging
Chongjie Ye, Yushuang Wu, Ziteng Lu, Jiahao Chang, Xiaoyang Guo, Jiaqing Zhou, Hao Zhao, Xiaoguang Han Hierarchical Variational Test-Time Prompt Generation for Zero-Shot Generalization
Zhaoyang Wu, Fang Liu, Licheng Jiao, Shuo Li, Lingling Li, Xu Liu, Puhua Chen, Wenping Ma Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
Jiahua Dong, Hui Yin, Wenqi Liang, Hanbin Zhao, Henghui Ding, Nicu Sebe, Salman Khan, Fahad Shahbaz Khan Hierarchy UGP: Hierarchy Unified Gaussian Primitive for Large-Scale Dynamic Scene Reconstruction
Hongyang Sun, Qinglin Yang, Jiawei Wang, Zhen Xu, Chen Liu, Yida Wang, Kun Zhan, Hujun Bao, Xiaowei Zhou, Sida Peng HiGarment: Cross-Modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image
Junyi Guo, Jingxuan Zhang, Fangyu Wu, Huanda Lu, Qiufeng Wang, Wenmian Yang, Eng Gee Lim, Dongming Lu HIS-GPT: Towards 3D Human-in-Scene Multimodal Understanding
Jiahe Zhao, Ruibing Hou, Zejie Tian, Hong Chang, Shiguang Shan Holistic Tokenizer for Autoregressive Image Generation
Anlin Zheng, Haochen Wang, Yucheng Zhao, Weipeng Deng, Tiancai Wang, Xiangyu Zhang, Xiaojuan Qi HORT: Monocular Hand-Held Objects Reconstruction with Transformers
Zerui Chen, Rolandos Alexandros Potamias, Shizhe Chen, Cordelia Schmid HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Models
Yiwen Chen, Hieu T. Nguyen, Vikram Voleti, Varun Jampani, Huaizu Jiang HouseTour: A Virtual Real Estate A(I)gent
Ata Çelen, Marc Pollefeys, Daniel Barath, Iro Armeni How Can Objects Help Video-Language Understanding?
Zitian Tang, Shijie Wang, Junho Cho, Jaewook Yoo, Chen Sun How Far Are AI-Generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach
Chirui Chang, Jiahui Liu, Zhengzhe Liu, Xiaoyang Lyu, Yi-Hua Huang, Xin Tao, Pengfei Wan, Di Zhang, Xiaojuan Qi How to Make Your Cell Tracker Say "i Dunno!"
Richard D. Paul, Johannes Seiffarth, David Rügamer, Katharina Nöh, Hanno Scharr HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang, Wenliang Zheng, Aashrith Madasu, Peng Shi, Ryo Kamoi, Hao Zhou, Zhuoyang Zou, Shu Zhao, Sarkar Snigdha Sarathi Das, Vipul Gupta, Xiaoxin Lu, Nan Zhang, Ranran Haoran Zhang, Avitej Iyer, Renze Lou, Wenpeng Yin, Rui Zhang Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling
Christopher Xie, Armen Avetisyan, Henry Howard-Jenkins, Yawar Siddiqui, Julian Straub, Richard Newcombe, Vasileios Balntas, Jakob Engel HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis
Timo Teufel, Pulkit Gera, Xilong Zhou, Umar Iqbal, Pramod Rao, Jan Kautz, Vladislav Golyanik, Christian Theobalt HumorDB: Can AI Understand Graphical Humor?
Vedaant V Jain, Gabriel Kreiman, Felipe dos Santos Alves Feitosa HUMOTO: A 4D Dataset of Mocap Human Object Interactions
Jiaxin Lu, Chun-Hao Paul Huang, Uttaran Bhattacharya, Qixing Huang, Yi Zhou HUST: High-Fidelity Unbiased Skin Tone Estimation via Texture Quantization
Zimin Ran, Xingyu Ren, Xiang An, Kaicheng Yang, Ziyong Feng, Jing Yang, Rolandos Alexandros Potamias, Linchao Zhu, Jiankang Deng HVPUNet: Hybrid-Voxel Point-Cloud Upsampling Network
Juhyung Ha, Vibhas Kumar Vats, Soon-heung Jung, Alimoor Reza, David J. Crandall Hybrid Layout Control for Diffusion Transformer: Fewer Annotations, Superior Aesthetics
Keming Wu, Junwen Chen, Zhanhao Liang, Yinuo Wang, Ji Li, Chao Zhang, Bin Wang, Yuhui Yuan Hybrid-Grained Feature Aggregation with Coarse-to-Fine Language Guidance for Self-Supervised Monocular Depth Estimation
Wenyao Zhang, Hongsi Liu, Bohan Li, Jiawei He, Zekun Qi, Yunnan Wang, Shengyang Zhao, Xinqiang Yu, Wenjun Zeng, Xin Jin Hydra-NeXt: Robust Closed-Loop Driving with Open-Loop Training
Zhenxin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Zuxuan Wu, Jose M. Alvarez HyperGCT: A Dynamic Hyper-GNN-Learned Geometric Constraint for 3D Registration
Xiyu Zhang, Jiayi Ma, Jianwei Guo, Wei Hu, Zhaoshuai Qi, Fei Hui, Jiaqi Yang, Yanning Zhang Hypergraph Clustering Network with Partial Attribute Imputation
Qianqian Wang, Bowen Zhao, Zhengming Ding, Wei Feng, Quanxue Gao HyTIP: Hybrid Temporal Information Propagation for Masked Conditional Residual Video Coding
Yi-Hsin Chen, Yi-Chen Yao, Kuan-Wei Ho, Chun-Hung Wu, Huu-Tai Phung, Martin Benjak, Jörn Ostermann, Wen-Hsiao Peng I2-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting
Zhimin Liao, Ping Wei, Ruijie Zhang, Shuaijia Chen, Haoxuan Wang, Ziyang Ren I2VControl: Disentangled and Unified Video Motion Synthesis Control
Wanquan Feng, Tianhao Qi, Jiawei Liu, Mingzhen Sun, Pengqi Tu, Tianxiang Ma, Fei Dai, Songtao Zhao, Siyu Zhou, Qian He ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing
Yulin Pan, Xiangteng He, Chaojie Mao, Zhen Han, Zeyinzi Jiang, Jingfeng Zhang, Yu Liu IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
Ruofan Wang, Juncheng Li, Yixu Wang, Bo Wang, Xiaosen Wang, Yan Teng, Yingchun Wang, Xingjun Ma, Yu-Gang Jiang Identity Preserving 3D Head Stylization with Multiview Score Distillation
Bahri Batuhan Bilecen, Ahmet Berke Gökmen, Furkan Guzelant, Aysegul Dundar IDFace: Face Template Protection for Efficient and Secure Identification
Sunpill Kim, Seunghun Paik, Chanwoo Hwang, Dongsoo Kim, Junbum Shin, Jae Hong Seo IGD: Instructional Graphic Design with Multimodal Layer Generation
Yadong Qu, Shancheng Fang, Yuxin Wang, Xiaorui Wang, Zhineng Chen, Hongtao Xie, Yongdong Zhang IGL-Nav: Incremental 3D Gaussian Localization for Image-Goal Navigation
Wenxuan Guo, Xiuwei Xu, Hang Yin, Ziwei Wang, Jianjiang Feng, Jie Zhou, Jiwen Lu ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance
Chunwei Wang, Guansong Lu, Junwei Yang, Runhui Huang, Jianhua Han, Lu Hou, Wei Zhang, Hang Xu Im2Haircut: Single-View Strand-Based Hair Reconstruction for Human Avatars
Vanessa Sklyarova, Egor Zakharov, Malte Prinzler, Giorgio Becherini, Michael J. Black, Justus Thies IM360: Large-Scale Indoor Mapping with 360 Cameras
Dongki Jung, Jaehoon Choi, Yonghan Lee, Dinesh Manocha ImageGem: In-the-Wild Generative Image Interaction Dataset for Generative Model Personalization
Yuanhe Guo, Linxi Xie, Zhuoran Chen, Kangrui Yu, Ryan Po, Guandao Yang, Gordon Wetzstein, Hongyi Wen iManip: Skill-Incremental Learning for Robotic Manipulation
Zexin Zheng, Jia-Feng Cai, Xiao-Ming Wu, Yi-Lin Wei, Yu-Ming Tang, Ancong Wu, Wei-Shi Zheng Imbalance in Balance: Online Concept Balancing in Generation Models
Yukai Shi, Jiarong Ou, Rui Chen, Haotian Yang, Jiahao Wang, Xin Tao, Pengfei Wan, Di Zhang, Kun Gai IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
Jiayi Guo, Chuanhao Yan, Xingqian Xu, Yulin Wang, Kai Wang, Gao Huang, Humphrey Shi ImHead: A Large-Scale Implicit Morphable Model for Localized Head Modeling
Rolandos Alexandros Potamias, Stathis Galanakis, Jiankang Deng, Athanasios Papaioannou, Stefanos Zafeiriou IMoRe: Implicit Program-Guided Reasoning for Human Motion Q&A
Chen Li, Chinthani Sugandhika, Yeo Keat Ee, Eric Peh, Hao Zhang, Hong Yang, Deepu Rajan, Basura Fernando Implicit Counterfactual Learning for Audio-Visual Segmentation
Mingfeng Zha, Tianyu Li, Guoqing Wang, Peng Wang, Yangyang Wu, Yang Yang, Heng Tao Shen Improved Noise Schedule for Diffusion Training
Tiankai Hang, Shuyang Gu, Jianmin Bao, Fangyun Wei, Dong Chen, Xin Geng, Baining Guo Improving Noise Efficiency in Privacy-Preserving Dataset Distillation
Runkai Zheng, Vishnu Asutosh Dasu, Yinong Oliver Wang, Haohan Wang, Fernando De La Torre Improving Rectified Flow with Boundary Conditions
Xixi Hu, Runlong Liao, Keyang Xu, Bo Liu, Yeqing Li, Eugene Ie, Hongliang Fei, Qiang Liu Inference-Time Diffusion Model Distillation
Geon Yeong Park, Sang Wan Lee, Jong Chul Ye InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
Tao Han, Wanghan Xu, Junchao Gong, Xiaoyu Yue, Song Guo, Luping Zhou, Lei Bai InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
Yifan Lu, Xuanchi Ren, Jiawei Yang, Tianchang Shen, Zhangjie Wu, Jun Gao, Yue Wang, Siheng Chen, Mike Chen, Sanja Fidler, Jiahui Huang InfoBridge: Balanced Multimodal Integration Through Conditional Dependency Modeling
Chenxin Li, Yifan Liu, Panwang Pan, Hengyu Liu, Xinyu Liu, Wuyang Li, Cheng Wang, Weihao Yu, Yiyang Lin, Yixuan Yuan Information Density Principle for MLLM Benchmarks
Chunyi Li, Xiaozhe Li, Zicheng Zhang, Yuan Tian, Ziheng Jia, Xiaohong Liu, Xiongkuo Min, Jia Wang, Haodong Duan, Kai Chen, Guangtao Zhai Information-Bottleneck Driven Binary Neural Network for Change Detection
Kaijie Yin, Zhiyuan Zhang, Shu Kong, Tian Gao, Cheng-Zhong Xu, Hui Kong Instance-Level Video Depth in Groups Beyond Occlusions
Yuan Liang, Yang Zhou, Ziming Sun, Tianyi Xiang, Guiqing Li, Shengfeng He InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes
Zesong Yang, Bangbang Yang, Wenqi Dong, Chenxuan Cao, Liyuan Cui, Yuewen Ma, Zhaopeng Cui, Hujun Bao INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
Xin Dong, Shichao Dong, Jin Wang, Jing Huang, Li Zhou, Zenghui Sun, Lihua Jing, Jinsong Lan, Xiaoyong Zhu, Bo Zheng Interpretable Point Cloud Classification Using Multiple Instance Learning
Matt De Vries, Reed Naidoo, Olga Fourkioti, Lucas G. Dent, Nathan Curry, Chris Dunsby, Chris Bakal Intra-View and Inter-View Correlation Guided Multi-View Novel Class Discovery
Xinhang Wan, Jiyuan Liu, Qian Qu, Suyuan Liu, Chuyu Zhang, Fangdi Wang, Xinwang Liu, En Zhu, Kunlun He IntrinsicControlNet: Cross-Distribution Image Generation with Real and Unreal
Jiayuan Lu, Rengan Xie, Zixuan Xie, Zhizhen Wu, Dianbing Xi, Qi Ye, Rui Wang, Hujun Bao, Yuchi Huo InvRGB+L: Inverse Rendering of Complex Scenes with Unified Color and LiDAR Reflectance Modeling
Xiaoxue Chen, Bhargav Chandaka, Chih-Hao Lin, Ya-Qin Zhang, David Forsyth, Hao Zhao, Shenlong Wang IRASim: A Fine-Grained World Model for Robot Manipulation
Fangqi Zhu, Hongtao Wu, Song Guo, Yuxiao Liu, Chilam Cheang, Tao Kong Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Zhiqi Ge, Juncheng Li, Xinglei Pang, Minghe Gao, Kaihang Pan, Wang Lin, Hao Fei, Wenqiao Zhang, Siliang Tang, Yueting Zhuang Is CLIP Ideal? No. Can We Fix It? Yes!
Raphi Kang, Yue Song, Georgia Gkioxari, Pietro Perona Is Visual In-Context Learning for Compositional Medical Tasks Within Reach?
Simon Reiß, Zdravko Marinov, Alexander Jaus, Constantin Seibold, M. Saquib Sarfraz, Erik Rodner, Rainer Stiefelhagen JailbreakDiffBench: A Comprehensive Benchmark for Jailbreaking Diffusion Models
Xiaolong Jin, Zixuan Weng, Hanxi Guo, Chenlong Yin, Siyuan Cheng, Guangyu Shen, Xiangyu Zhang Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Shiji Zhao, Ranjie Duan, Fengxiang Wang, Chi Chen, Caixin Kang, Shouwei Ruan, Jialing Tao, YueFeng Chen, Hui Xue, Xingxing Wei Joint Asymmetric Loss for Learning with Noisy Labels
Jialiang Wang, Xianming Liu, Xiong Zhou, Gangfeng Hu, Deming Zhai, Junjun Jiang, Xiangyang Ji Joint Self-Supervised Video Alignment and Action Segmentation
Ali Shah Ali, Syed Ahmed Mahmood, Mubin Saeed, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran JPEG Processing Neural Operator for Backward-Compatible Coding
Woo Kyoung Han, Yongjun Lee, Byeonghun Lee, Sang Hyun Park, Sunghoon Im, Kyong Hwan Jin Kaputt: A Large-Scale Dataset for Visual Defect Detection
Sebastian Höfer, Dorian F. Henning, Artemij Amiranashvili, Douglas Morrison, Mariliza Tzes, Ingmar Posner, Marc Matvienko, Alessandro Rennola, Anton Milan Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description
Mahmoud Ahmed, Junjie Fei, Jian Ding, Eslam Mohamed Bakr, Mohamed Elhoseiny KinMo: Kinematic-Aware Human Motion Understanding and Generation
Pengfei Zhang, Pinxin Liu, Pablo Garrido, Hyeongwoo Kim, Bindita Chaudhuri Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
Junsung Park, Jungbeom Lee, Jongyoon Song, Sangwon Yu, Dahuin Jung, Sungroh Yoon Knowledge Distillation for Learned Image Compression
Yunuo Chen, Zezheng Lyu, Bing He, Ning Cao, Gang Chen, Guo Lu, Wenjun Zhang Knowledge Distillation with Refined Logits
Wujie Sun, Defang Chen, Siwei Lyu, Genlang Chen, Chun Chen, Can Wang Knowledge Transfer from Interaction Learning
Yilin Gao, Kangyi Chen, Zhongxing Peng, Hengjie Lu, Shugong Xu Knowledge-Guided Part Segmentation
Xuejian Gou, Fang Liu, Licheng Jiao, Shuo Li, Lingling Li, Hao Wang, Xu Liu, Puhua Chen, Wenping Ma Laboring on Less Labors: RPCA Paradigm for Pan-Sharpening
Honghui Xu, Chuangjie Fang, Yibin Wang, Jie Wu, Jianwei Zheng LACONIC: A 3D Layout Adapter for Controllable Image Creation
Léopold Maillard, Tom Durand, Adrien Ramanana Rahary, Maks Ovsjanikov LaCoOT: Layer Collapse Through Optimal Transport
Victor Quétu, Zhu Liao, Nour Hezbri, Fabio Pizzati, Enzo Tartaglione LangBridge: Interpreting Image as a Combination of Language Embeddings
Jiaqi Liao, Yuwei Niu, Fanqing Meng, Hao Li, Changyao Tian, Yinuo Du, Yuwen Xiong, Dianqi Li, Xizhou Zhu, Li Yuan, Jifeng Dai, Yu Cheng LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation
Wei-Jer Chang, Wei Zhan, Masayoshi Tomizuka, Manmohan Chandraker, Francesco Pittaluga Language Driven Occupancy Prediction
Zhu Yu, Bowen Pang, Lizhe Liu, Runmin Zhang, Qiang Li, Si-Yuan Cao, Maochun Luo, Mingxia Chen, Sheng Yang, Hui-Liang Shen Latent Diffusion Models with Masked AutoEncoders
Junho Lee, Jeongwoo Shin, Hyungwook Choi, Joonseok Lee Latent Swap Joint Diffusion for 2D Long-Form Latent Generation
Yusheng Dai, Chenxi Wang, Chang Li, Chen Wang, Kewei Li, Jun Du, Lei Sun, Jianqing Gao, Ruoyu Wang, Jiefeng Ma LATINO-PRO: LAtent consisTency INverse sOlver with PRompt Optimization
Alessio Spagnoletti, Jean Prost, Andrés Almansa, Nicolas Papadakis, Marcelo Pereyra LawDIS: Language-Window-Based Controllable Dichotomous Image Segmentation
Xinyu Yan, Meijun Sun, Ge-Peng Ji, Fahad Shahbaz Khan, Salman Khan, Deng-Ping Fan Layer-Wise Vision Injection with Disentangled Attention for Efficient LVLMs
Xuange Zhang, Dengjie Li, Bo Liu, Zenghao Bao, Yao Zhou, Baisong Yang, Zhongying Liu, Yujie Zhong, Tongtong Yuan LayerAnimate: Layer-Level Control for Animation
Yuxue Yang, Lue Fan, Zuzeng Lin, Feng Wang, Zhaoxiang Zhang LayerD: Decomposing Raster Graphic Designs into Layers
Tomoyuki Suzuki, Kang-Jun Liu, Naoto Inoue, Kota Yamaguchi LayerLock: Non-Collapsing Representation Learning with Progressive Freezing
Goker Erdogan, Nikhil Parthasarathy, Catalin Ionescu, Drew A. Hudson, Alexander Lerchner, Andrew Zisserman, Mehdi S. M. Sajjadi, Joao Carreira LazyMAR: Accelerating Masked Autoregressive Models via Feature Caching
Feihong Yan, Qingyan Wei, Jiayi Tang, Jiajun Li, Yulin Wang, Xuming Hu, Huiqi Li, Linfeng Zhang LBM: Latent Bridge Matching for Fast Image-to-Image Translation
Clément Chadebec, Onur Tasar, Sanjeev Sreetharan, Benjamin Aubin LDIP: Long Distance Information Propagation for Video Super-Resolution
Michael Bernasconi, Abdelaziz Djelouah, Yang Zhang, Markus Gross, Christopher Schroers Learning 4D Embodied World Models
Haoyu Zhen, Qiao Sun, Hongxin Zhang, Junyan Li, Siyuan Zhou, Yilun Du, Chuang Gan Learning a Unified Template for Gait Recognition
Panjian Huang, Saihui Hou, Junzhou Huang, Yongzhen Huang Learning an Implicit Physics Model for Image-Based Fluid Simulation
Emily Yue-Ting Jia, Jiageng Mao, Zhiyuan Gao, Yajie Zhao, Yue Wang Learning Counterfactually Decoupled Attention for Open-World Model Attribution
Yu Zheng, Boyang Gong, Fanye Kong, Yueqi Duan, Bingyao Yu, Wenzhao Zheng, Lei Chen, Jiwen Lu, Jie Zhou Learning Implicit Features with Flow-Infused Transformations for Realistic Virtual Try-on
Delong Zhang, Qiwei Huang, Yang Sun, Yuanliu Liu, Wei-Shi Zheng, Pengfei Xiong, Wei Zhang Learning Interpretable Queries for Explainable Image Classification with Information Pursuit
Stefan Kolek, Aditya Chattopadhyay, Kwan Ho Ryan Chan, Hector Andrade-Loarca, Gitta Kutyniok, René Vidal Learning Neural Scene Representation from iToF Imaging
Wenjie Chang, Hanzhi Chang, Yueyi Zhang, Wenfei Yang, Tianzhu Zhang Learning Normal Flow Directly from Events
Dehao Yuan, Levi Burner, Jiayi Wu, Minghui Liu, Jingxi Chen, Yiannis Aloimonos, Cornelia Fermüller Learning on the Go: A Meta-Learning Object Navigation Model
Xiaorong Qin, Xinhang Song, Sixian Zhang, Xinyao Yu, Xinmiao Zhang, Shuqiang Jiang Learning Precise Affordances from Egocentric Videos for Robotic Manipulation
Gen Li, Nikolaos Tsagkas, Jifei Song, Ruaridh Mon-Williams, Sethu Vijayakumar, Kun Shao, Laura Sevilla-Lara Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts
Yun Wang, Longguang Wang, Chenghao Zhang, Yongjian Zhang, Zhanjie Zhang, Ao Ma, Chenyou Fan, Tin Lun Lam, Junjie Hu Learning Streaming Video Representation via Multitask Training
Yibin Yan, Jilan Xu, Shangzhe Di, Yikun Liu, Yudi Shi, Qirui Chen, Zeqian Li, Yifei Huang, Weidi Xie Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu, Khoi Duc Nguyen, Preeti Mukherjee, Saurabh Bagchi, Somali Chaterji, Yingyu Liang, Yin Li Learning to See in the Extremely Dark
Hai Jiang, Binhao Guan, Zhen Liu, Xiaohong Liu, Jian Yu, Zheng Liu, Songchen Han, Shuaicheng Liu Learning Visual Hierarchies in Hyperbolic Space for Image Retrieval
Ziwei Wang, Sameera Ramasinghe, Chenchen Xu, Julien Monteil, Loris Bazzani, Thalaiyasingam Ajanthan Learning Visual Proxy for Compositional Zero-Shot Learning
Shiyu Zhang, Cheng Yan, Yang Liu, Chenchen Jing, Lei Zhou, Wenjun Wang LEGION: Learning to Ground and Explain for Synthetic Image Detection
Hengrui Kang, Siwei Wen, Zichen Wen, Junyan Ye, Weijia Li, Peilin Feng, Baichuan Zhou, Bin Wang, Dahua Lin, Linfeng Zhang, Conghui He Less Is More: Empowering GUI Agent with Context-Aware Simplification
Gongwei Chen, Xurui Zhou, Rui Shao, Yibo Lyu, Kaiwen Zhou, Shuai Wang, Wentao Li, Yinchuan Li, Zhongang Qi, Liqiang Nie Less Is More: Improving Motion Diffusion Models with Sparse Keyframes
Jinseok Bae, Inwoo Hwang, Young-Yoon Lee, Ziyu Guo, Joseph Liu, Yizhak Ben-Shabat, Young Min Kim, Mubbasir Kapadia Leveraging 2D Priors and SDF Guidance for Urban Scene Rendering
Siddharth Tourani, Jayaram Reddy, Akash Kumbar, Satyajit Tourani, Nishant Goyal, Madhava Krishna, N Dinesh Reddy, Muhammad Haris Khan Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis
Junyan Ye, Jun He, Weijia Li, Zhutao Lv, Yi Lin, Jinhua Yu, Haote Yang, Conghui He Leveraging Prior Knowledge of Diffusion Model for Person Search
Giyeol Kim, Sooyoung Yang, Jihyong Oh, Myungjoo Kang, Chanho Eom LHM: Large Animatable Human Reconstruction Model for Single Image to 3D in Seconds
Lingteng Qiu, Xiaodong Gu, Peihao Li, Qi Zuo, Weichao Shen, Junfei Zhang, Kejie Qiu, Weihao Yuan, Guanying Chen, Zilong Dong, Liefeng Bo Liberated-GS: 3D Gaussian Splatting Independent from SfM Point Clouds
Weihong Pan, Xiaoyu Zhang, Hongjia Zhai, Xiaojun Xiang, Hanqing Jiang, Guofeng Zhang LiDAR Waveforms Are Worth 40x128x33 Words
Dominik Scheuble, Hanno Holzhüter, Steven Peters, Mario Bijelic, Felix Heide LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding
Amirhossein Kazerouni, Soroush Mehraban, Michael Brudno, Babak Taati Light-a-Video: Training-Free Video Relighting via Progressive Light Fusion
Yujie Zhou, Jiazi Bu, Pengyang Ling, Pan Zhang, Tong Wu, Qidong Huang, Jinsong Li, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Anyi Rao, Jiaqi Wang, Li Niu LightsOut: Diffusion-Based Outpainting for Enhanced Lens Flare Removal
Shr-Ruei Tsai, Wei-Cheng Chang, Jie-Ying Lee, Chih-Hai Su, Yu-Lun Liu LIRA: Inferring Segmentation in Large Multi-Modal Models with Local Interleaved Region Assistance
Zhang Li, Biao Yang, Qiang Liu, Shuo Zhang, Zhiyin Ma, Liang Yin, Linger Deng, Yabo Sun, Yuliang Liu, Xiang Bai LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
Jiahao Wang, Ning Kang, Lewei Yao, Mengzhao Chen, Chengyue Wu, Songyang Zhang, Shuchen Xue, Yong Liu, Taiqiang Wu, Xihui Liu, Kaipeng Zhang, Shifeng Zhang, Wenqi Shao, Zhenguo Li, Ping Luo LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Guowei Xu, Peng Jin, Ziang Wu, Hao Li, Yibing Song, Lichao Sun, Li Yuan LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
Yuxuan Cai, Jiangning Zhang, Haoyang He, Xinwei He, Ao Tong, Zhenye Gan, Chengjie Wang, Zhucun Xue, Yong Liu, Xiang Bai Local Dense Logit Relations for Enhanced Knowledge Distillation
Liuchi Xu, Kang Liu, Jinshuai Liu, Lu Wang, Lisheng Xu, Jun Cheng Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer
Md Ashiqur Rahman, Chiao-An Yang, Michael N. Cheng, Lim Jun Hao, Jeremiah Jiang, Teck-Yian Lim, Raymond A. Yeh LocalDyGS: Multi-View Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling
Jiahao Wu, Rui Peng, Jianbo Jiao, Jiayu Yang, Luyang Tang, Kaiqiang Xiong, Jie Liang, Jinbo Yan, Runling Liu, Ronggang Wang LOMM: Latest Object Memory Management for Temporally Consistent Video Instance Segmentation
Seunghun Lee, Jiwan Seo, Minwoo Choi, Kiljoon Han, Jahoon Jeong, Zane Durante, Ehsan Adeli, Sang Hyun Park, Sunghoon Im Long Context Tuning for Video Generation
Yuwei Guo, Ceyuan Yang, Ziyan Yang, Zhibei Ma, Zhijie Lin, Zhenheng Yang, Dahua Lin, Lu Jiang Long-Context State-Space Video World Models
Ryan Po, Yotam Nitzan, Richard Zhang, Berlin Chen, Tri Dao, Eli Shechtman, Gordon Wetzstein, Xun Huang Long-LRM: Long-Sequence Large Reconstruction Model for Wide-Coverage Gaussian Splats
Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yicong Hong, Li Fuxin, Zexiang Xu LONG3R: Long Sequence Streaming 3D Reconstruction
Zhuoguang Chen, Minghui Qin, Tianyuan Yuan, Zhe Liu, Hang Zhao LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos
Chin-Yang Lin, Cheng Sun, Fu-En Yang, Min-Hung Chen, Yen-Yu Lin, Yu-Lun Liu Looking in the Mirror: A Faithful Counterfactual Explanation Method for Interpreting Deep Image Classification Models
Townim Chowdhury, Vu Minh Hieu Phan, Kewen Liao, Nanyu Dong, Minh-Son To, Anton van den Hengel, Johan W. Verjans, Zhibin Liao LookOut: Real-World Humanoid Egocentric Navigation
Boxiao Pan, Adam W. Harley, Francis Engelmann, C. Karen Liu, Leonidas J. Guibas LOTA: Bit-Planes Guided AI-Generated Image Detection
Hongsong Wang, Renxi Cheng, Yang Zhang, Chaolei Han, Jie Gui LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing
Federico Girella, Davide Talon, Ziyue Liu, Zanxi Ruan, Yiming Wang, Marco Cristani Low-Light Image Enhancement Using Event-Based Illumination Estimation
Lei Sun, Yuhan Bao, Jiajun Zhai, Jingyun Liang, Yulun Zhang, Kaiwei Wang, Danda Pani Paudel, Luc Van Gool Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
Qi Qin, Le Zhuo, Yi Xin, Ruoyi Du, Zhen Li, Bin Fu, Yiting Lu, Xinyue Li, Dongyang Liu, Xiangyang Zhu, Will Beddow, Erwann Millon, Victor Perez, Wenhai Wang, Yu Qiao, Bo Zhang, Xiaohong Liu, Hongsheng Li, Chang Xu, Peng Gao LUSD: Localized Update Score Distillation for Text-Guided Image Editing
Worameth Chinchuthakun, Tossaporn Saengja, Nontawat Tritrong, Pitchaporn Rewatbowornwong, Pramook Khungurn, Supasorn Suwajanakorn LV-MAE: Learning Long Video Representations Through Masked-Embedding Autoencoders
Ilan Naiman, Emanuel Ben-Baruch, Oron Anschel, Alon Shoshan, Igor Kviatkovsky, Manoj Aggarwal, Gerard Medioni LVBench: An Extreme Long Video Understanding Benchmark
Weihan Wang, Zehai He, Wenyi Hong, Yean Cheng, Xiaohan Zhang, Ji Qi, Ming Ding, Xiaotao Gu, Shiyu Huang, Bin Xu, Yuxiao Dong, Jie Tang LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition
Jinghan You, Shanglin Li, Yuanrui Sun, Jiangchuan Wei, Mingyu Guo, Chao Feng, Jiao Ran Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong, Chengyao Wang, Yuqi Liu, Senqiao Yang, Longxiang Tang, Yuechen Zhang, Jingyao Li, Tianyuan Qu, Yanwei Li, Yukang Chen, Shaozuo Yu, Sitong Wu, Eric Lo, Shu Liu, Jiaya Jia M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision
Kailai Zhou, Fuqiang Yang, Shixian Wang, Bihan Wen, Chongde Zi, Linsen Chen, Qiu Shen, Xun Cao M2EIT: Multi-Domain Mixture of Experts for Robust Neural Inertial Tracking
Yan Li, Yang Xu, Changhao Chen, Zhongchen Shi, Wei Chen, Liang Xie, Hongbo Chen, Erwei Yin MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval
Jaeseok Byun, Young Kyun Jang, Seokhyeon Jeong, Donghyun Kim, Taesup Moon Magic Insert: Style-Aware Drag-and-Drop
Nataniel Ruiz, Yuanzhen Li, Neal Wadhwa, Yael Pritch, Michael Rubinstein, David E. Jacobs, Shlomi Fruchter MagicColor: Multi-Instance Sketch Colorization
Yinhan Zhang, Yue Ma, Bingyuan Wang, Qifeng Chen, Zeyu Wang MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers
Yuechen Zhang, Yaoyang Liu, Bin Xia, Bohao Peng, Zexin Yan, Eric Lo, Jiaya Jia Make Your Training Flexible: Towards Deployment-Efficient Video Models
Chenting Wang, Kunchang Li, Tianxiang Jiang, Xiangyu Zeng, Yi Wang, Limin Wang Mamba-3VL: Taming State Space Model for 3D Vision Language Learning
Yuan Wang, Yuxin Chen, Zhongang Qi, Lijun Liu, Jile Jiao, Xuetao Feng, Yujia Liang, Ying Shan, Zhipeng Zhang MamTiff-CAD: Multi-Scale Latent Diffusion with Mamba+ for Complex Parametric Sequence
Liyuan Deng, Yunpeng Bai, Yongkang Dai, Xiaoshui Huang, Hongping Gan, Dongshuo Huang, Hao Jiacheng, Yilei Shi Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
Jiahao Zhang, Anoop Cherian, Cristian Rodriguez, Weijian Deng, Stephen Gould Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion
Massimiliano Viola, Kevin Qu, Nando Metzger, Bingxin Ke, Alexander Becker, Konrad Schindler, Anton Obukhov MaskControl: Spatio-Temporal Control for Masked Motion Synthesis
Ekkasit Pinyoanuntapong, Muhammad Saleem, Korrawe Karunratanakul, Pu Wang, Hongfei Xue, Chen Chen, Chuan Guo, Junli Cao, Jian Ren, Sergey Tulyakov MaskHand: Generative Masked Modeling for Robust Hand Mesh Reconstruction in the Wild
Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Mayur Jagdishbhai Patel, Hongfei Xue, Ahmed Helmy, Srijan Das, Pu Wang Mastering Collaborative Multi-Modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Qifan Yu, Zhebei Shen, Zhongqi Yue, Yang Wu, Bosheng Qin, Wenqiao Zhang, Yunfei Li, Juncheng Li, Siliang Tang, Yueting Zhuang MatchDiffusion: Training-Free Generation of Match-Cuts
Alejandro Pardo, Fabio Pizzati, Tong Zhang, Alexander Pondaven, Philip Torr, Juan Camilo Perez, Bernard Ghanem MaTe: Images Are All You Need for Material Transfer via Diffusion Transformer
Nisha Huang, Henglin Liu, Yizhou Lin, Kaer Huang, Chubin Chen, Jie Guo, Tong-yee Lee, Xiu Li MaterialMVP: Illumination-Invariant Material Generation via Multi-View PBR Diffusion
Zebin He, Mingxin Yang, Shuhui Yang, Yixuan Tang, Tao Wang, Kaihao Zhang, Guanying Chen, Yuhong Liu, Jie Jiang, Chunchao Guo, Wenhan Luo MAVias: Mitigate Any Visual Bias
Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou MCID: Multi-Aspect Copyright Infringement Detection for Generated Images
Chuanwei Huang, Zexi Jia, Hongyan Fei, Yeshuang Zhu, Zhiqiang Yuan, Ying Deng, Jiapei Zhang, Xiaoyue Duan, Jinchao Zhang, Jie Zhou MCOP: Multi-UAV Collaborative Occupancy Prediction
Zefu Lin, Wenbo Chen, Xiaojuan Jin, Yuran Yang, Lue Fan, Yixin Zhang, Yufeng Zhang, Zhaoxiang Zhang MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation
Prerit Gupta, Jason Alexander Fotso-Puepi, Zhengyuan Li, Jay Mehta, Aniket Bera MDP3: A Training-Free Approach for List-Wise Frame Selection in Video-LLMs
Hui Sun, Shiyin Lu, Huanyu Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Ming Li Medical World Model
Yijun Yang, Zhao-Yang Wang, Qiuping Liu, Shuwen Sun, Kang Wang, Rama Chellappa, Zongwei Zhou, Alan Yuille, Lei Zhu, Yu-Dong Zhang, Jieneng Chen MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs
Jiawei Mao, Yuhan Wang, Yucheng Tang, Daguang Xu, Kang Wang, Yang Yang, Zongwei Zhou, Yuyin Zhou MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes
Xinjie Zhang, Zhening Liu, Yifan Zhang, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Zehong Lin, Shuicheng Yan, Jun Zhang MEH: A Multi-Style Dataset and Toolkit for Advancing Egyptian Hieroglyph Recognition
Maksim Golyadkin, Valeria Rubanova, Aleksandr Utkov, Dmitry Nikolotov, Ilya Makarov MeshAnything V2: Artist-Created Mesh Generation with Adjacent Mesh Tokenization
Yiwen Chen, Yikai Wang, Yihao Luo, Zhengyi Wang, Zilong Chen, Jun Zhu, Chi Zhang, Guosheng Lin MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Shuangkang Fang, I-Chao Shen, Yufeng Wang, Yi-Hsuan Tsai, Yi Yang, Shuchang Zhou, Wenrui Ding, Takeo Igarashi, Ming-Hsuan Yang MeshPad: Interactive Sketch-Conditioned Artist-Reminiscent Mesh Generation and Editing
Haoxuan Li, Ziya Erkoç, Lei Li, Daniele Sirigatti, Vladislav Rosov, Angela Dai, Matthias Nießner MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Shengbang Tong, David Fan, Jiachen Li, Yunyang Xiong, Xinlei Chen, Koustuv Sinha, Michael Rabbat, Yann LeCun, Saining Xie, Zhuang Liu MetaScope: Optics-Driven Neural Network for Ultra-Micro Metalens Endoscopy
Wuyang Li, Wentao Pan, Xiaoyuan Liu, Zhendong Luo, Chenxin Li, Hengyu Liu, Din Ping Tsai, Mu Ku Chen, Yixuan Yuan METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models
Yuchen Liu, Yaoming Wang, Bowen Shi, Xiaopeng Zhang, Wenrui Dai, Chenglin Li, Hongkai Xiong, Qi Tian MH-LVC: Multi-Hypothesis Temporal Prediction for Learned Conditional Residual Video Coding
Huu-Tai Phung, Zong-Lin Gao, Yi-Chen Yao, Kuan-Wei Ho, Yi-Hsin Chen, Yu-Hsiang Lin, Alessandro Gnutti, Wen-Hsiao Peng MIEB: Massive Image Embedding Benchmark
Chenghao Xiao, Isaac Chung, Imene Kerboua, Jamie Stirling, Xin Zhang, Márton Kardos, Roman Solomatin, Noura Al Moubayed, Kenneth Enevoldsen, Niklas Muennighoff MikuDance: Animating Character Art with Mixed Motion Dynamics
Jiaxu Zhang, Xianfang Zeng, Xin Chen, Wei Zuo, Gang Yu, Zhigang Tu MinCD-PnP: Learning 2D-3D Correspondences with Approximate Blind PnP
Pei An, Jiaqi Yang, Muyao Peng, You Yang, Qiong Liu, Xiaolin Wu, Liangliang Nan Mind the Cost of Scaffold! Benign Clients May Even Become Accomplices of Backdoor Attack
Xingshuo Han, Xuanye Zhang, Xiang Lan, Haozhao Wang, Shengmin Xu, Shen Ren, Jason Zeng, Ming Wu, Michael Heinrich, Tianwei Zhang Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching
Yuhan Liu, Jingwen Fu, Yang Wu, Kangyi Wu, Pengna Li, Jiayi Wu, Sanping Zhou, Jingmin Xin MINERVA: Evaluating Complex Video Reasoning
Arsha Nagrani, Sachit Menon, Ahmet Iscen, Shyamal Buch, Ramin Mehran, Nilpa Jha, Anja Hauth, Yukun Zhu, Carl Vondrick, Mikhail Sirotenko, Cordelia Schmid, Tobias Weyand MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models
Vittorio Pipoli, Alessia Saporita, Federico Bolelli, Marcella Cornia, Lorenzo Baraldi, Costantino Grana, Rita Cucchiara, Elisa Ficarra MistSense: Versatile Online Detection of Procedural and Execution Mistakes
Constantin Patsch, Yuankai Wu, Marsil Zakour, Driton Salihu, Eckehard Steinbach Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration
Katie Z Luo, Minh-Quan Dao, Zhenzhen Liu, Mark Campbell, Wei-Lun Chao, Kilian Q Weinberger, Ezio Malis, Vincent Fremont, Bharath Hariharan, Mao Shan, Stewart Worrall, Julie Stephany Berrio Perez Mixture of Experts Guided by Gaussian Splatters Matters: A New Approach to Weakly-Supervised Video Anomaly Detection
Giacomo D' Amicantonio, Snehashis Majhi, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond, Egor Bondarev Mixture-of-Scores: Robust Image-Text Data Valuation via Three Lines of Code
Sitong Wu, Haoru Tan, Yukang Chen, Shaofeng Zhang, Jingyao Li, Bei Yu, Xiaojuan Qi, Jiaya Jia MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding, Shenxi Wu, Xiangyu Zhao, Yuhang Zang, Haodong Duan, Xiaoyi Dong, Pan Zhang, Yuhang Cao, Dahua Lin, Jiaqi Wang MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger, Nina Wenzel, David Griffiths, Haiming Gang, Justin Lazarow, Gefen Kohavi, Kai Kang, Marcin Eichner, Yinfei Yang, Afshin Dehghan, Peter Grasch MMAD: Multi-Label Micro-Action Detection in Videos
Kun Li, Pengyu Liu, Dan Guo, Fei Wang, Zhiliang Wu, Hehe Fan, Meng Wang MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
Tianhong Gao, Yannian Fu, Weiqun Wu, Haixiao Yue, Shanshan Liu, Gang Zhang MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI
Huanjin Yao, Jiaxing Huang, Yawen Qiu, Michael K. Chen, Wenzheng Liu, Wei Zhang, Wenjie Zeng, Xikun Zhang, Jingyi Zhang, YuXin Song, Wenhao Wu, Dacheng Tao Mobile Video Diffusion
Haitam Ben Yahia, Denis Korzhenkov, Ioannis Lelekas, Amir Ghodrati, Amirhossein Habibian Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
Xuran Ma, Yexin Liu, Yaofu Liu, Xianfeng Wu, Mingzhe Zheng, Zihao Wang, Ser-Nam Lim, Harry Yang Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction
Giuseppe Cartella, Vittorio Cuculo, Alessandro D'Amelio, Marcella Cornia, Giuseppe Boccignone, Rita Cucchiara Modeling Saliency Dataset Bias
Matthias Kümmerer, Harneet Singh Khanuja, Matthias Bethge Moderating the Generalization of Score-Based Generative Model
Wan Jiang, He Wang, Xin Zhang, Dan Guo, Zhaoxin Fan, Yunfeng Diao, Richang Hong MoFRR: Mixture of Diffusion Models for Face Retouching Restoration
Jiaxin Liu, Qichao Ying, Zhenxing Qian, Sheng Li, Runqi Zhang, Jian Liu, Xinpeng Zhang MolParser: End-to-End Visual Recognition of Molecule Structures in the Wild
Xi Fang, Jiankun Wang, Xiaochen Cai, Shangqian Chen, Shuwen Yang, Haoyi Tao, Nan Wang, Lin Yao, Linfeng Zhang, Guolin Ke MoMa-Kitchen: A 100k+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation
Pingrui Zhang, Xianqiang Gao, Yuhan Wu, Kehui Liu, Dong Wang, Zhigang Wang, Bin Zhao, Yan Ding, Xuelong Li MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps
Jiahui Lei, Kyle Genova, George Kopanas, Noah Snavely, Leonidas Guibas Moment Quantization for Video Temporal Grounding
Xiaolong Sun, Le Wang, Sanping Zhou, Liushuai Shi, Kun Xia, Mengnan Liu, Yabing Wang, Gang Hua Monocular Facial Appearance Capture in the Wild
Yingyan Xu, Kate Gadola, Prashanth Chandran, Sebastian Weiss, Markus Gross, Gaspard Zoss, Derek Bradley MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion
Zihan Wang, Jeff Tan, Tarasha Khurana, Neehar Peri, Deva Ramanan MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network
Jianfei Jiang, Qiankun Liu, Haochen Yu, Hongyuan Liu, Liyong Wang, Jiansheng Chen, Huimin Ma MonSTeR: A Unified Model for Motion, Scene, Text Retrieval
Luca Collorone, Matteo Gioia, Massimiliano Pappa, Paolo Leoni, Giovanni Ficarra, Or Litany, Indro Spinelli, Fabio Galasso Morph: A Motion-Free Physics Optimization Framework for Human Motion Generation
Zhuo Li, Mingshuang Luo, Ruibing Hou, Xin Zhao, Hao Liu, Hong Chang, Zimo Liu, Chen Li MoSiC: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning
Mohammadreza Salehi, Shashanka Venkataramanan, Ioana Simion, Efstratios Gavves, Cees G. M. Snoek, Yuki M Asano Motion-2-to-3: Leveraging 2D Motion Data for 3D Motion Generations
Ruoxi Guo, Huaijin Pi, Zehong Shen, Qing Shuai, Zechen Hu, Zhumei Wang, Yajiao Dong, Ruizhen Hu, Taku Komura, Sida Peng, Xiaowei Zhou MotionCtrl: A Real-Time Controllable Vision-Language-Motion Model
Bin Cao, Sipeng Zheng, Ye Wang, Lujie Xia, Qianshan Wei, Qin Jin, Jing Liu, Zongqing Lu MotionFollower: Editing Video Motion via Score-Guided Diffusion
Shuyuan Tu, Qi Dai, Zihao Zhang, Sicheng Xie, Zhi-Qi Cheng, Chong Luo, Xintong Han, Zuxuan Wu, Yu-Gang Jiang MotionStreamer: Streaming Motion Generation via Diffusion-Based Autoregressive Model in Causal Latent Space
Lixing Xiao, Shunlin Lu, Huaijin Pi, Ke Fan, Liang Pan, Yueer Zhou, Ziyong Feng, Xiaowei Zhou, Sida Peng, Jingbo Wang Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
Ziyu Zhu, Xilin Wang, Yixuan Li, Zhuofan Zhang, Xiaojian Ma, Yixin Chen, Baoxiong Jia, Wei Liang, Qian Yu, Zhidong Deng, Siyuan Huang, Qing Li MSQ: Memory-Efficient Bit Sparsification Quantization
Seokho Han, Seoyeon Yoon, Jinhee Kim, Dongwei Wang, Kang Eun Jeon, Huanrui Yang, Jong Hwan Ko MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction
Yaopeng Lou, Liao Shen, Tianqi Liu, Jiaqi Li, Zihao Huang, Huiqiang Sun, Zhiguo Cao Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
Jeongseok Hyun, Sukjun Hwang, Su Ho Han, Taeoh Kim, Inwoong Lee, Dongyoon Wee, Joon-Young Lee, Seon Joo Kim, Minho Shim Multi-Identity Human Image Animation with Structural Video Diffusion
Zhenzhi Wang, Yixuan Li, Yanhong Zeng, Yuwei Guo, Dahua Lin, Tianfan Xue, Bo Dai Multi-Modal Multi-Platform Person Re-Identification: Benchmark and Method
Ruiyang Ha, Songyi Jiang, Bin Li, Bikang Pan, Yihang Zhu, Junjie Zhang, Xiatian Zhu, Shaogang Gong, Jingya Wang Multi-Modal Multi-Task Unified Embedding Model (M3T-UEM): A Task-Adaptive Representation Learning Framework
Rohan Sharma, Changyou Chen, Feng-Ju Chang, Seongjun Yun, Xiaohu Xie, Rui Meng, Dehong Xu, Alejandro Mottini, Qingjun Cui Multi-Object Sketch Animation by Scene Decomposition and Motion Planning
Jingyu Liu, Zijie Xin, Yuhan Fu, Ruixiang Zhao, Bangxiang Lan, Xirong Li Multi-Schema Proximity Network for Composed Image Retrieval
Jiangming Shi, Xiangbo Yin, Yeyun Chen, Yachao Zhang, Zhizhong Zhang, Yuan Xie, Yanyun Qu Multi-Turn Consistent Image Editing
Zijun Zhou, Yingying Deng, Xiangyu He, Weiming Dong, Fan Tang Multi-View 3D Point Tracking
Frano Rajič, Haofei Xu, Marko Mihajlovic, Siyuan Li, Irem Demir, Emircan Gündoğdu, Lei Ke, Sergey Prokudin, Marc Pollefeys, Siyu Tang Multi-View Gaze Target Estimation
Qiaomu Miao, Vivek Raju Golani, Jingyi Xu, Progga Paromita Dutta, Minh Hoai, Dimitris Samaras Multi-View Slot Attention Using Paraphrased Texts for Face Anti-Spoofing
Jeongmin Yu, Susang Kim, Kisu Lee, Taekyoung Kwon, Won-Yong Shin, Ha Young Kim Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation
Tim Elsner, Paula Usinger, Julius Nehring-Wirxel, Gregor Kobsik, Victor Czech, Yanjiang He, Isaak Lim, Leif Kobbelt Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation
Shengqi Liu, Yuhao Cheng, Zhuo Chen, Xingyu Ren, Wenhan Zhu, Lincheng Li, Mengxiao Bi, Xiaokang Yang, Yichao Yan Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
Shijie Zhou, Ruiyi Zhang, Huaisheng Zhu, Branislav Kveton, Yufan Zhou, Jiuxiang Gu, Jian Chen, Changyou Chen Multispectral Demosaicing via Dual Cameras
SaiKiran Tedla, Junyong Lee, Beixuan Yang, Mahmoud Afifi, Michael S. Brown MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models
Young-Jun Lee, Byung-Kwan Lee, Jianshu Zhang, Yechan Hwang, Byungsoo Ko, Han-Gyu Kim, Dongyu Yao, Xuankun Rong, Eojin Joo, Seung-Ho Han, Bowon Ko, Ho-Jin Choi Music Grounding by Short Video
Zijie Xin, Minquan Wang, Jingyu Liu, Quan Chen, Ye Ma, Peng Jiang, Xirong Li Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
Xiaojie Li, Ronghui Li, Shukai Fang, Shuzhao Xie, Xiaoyang Guo, Jiaqing Zhou, Junkun Peng, Zhi Wang MV-Adapter: Multi-View Consistent Image Generation Made Easy
Zehuan Huang, Yuan-Chen Guo, Haoran Wang, Ran Yi, Lizhuang Ma, Yan-Pei Cao, Lu Sheng NATRA: Noise-Agnostic Framework for Trajectory Prediction with Noisy Observations
Rongqing Li, Changsheng Li, Ruilin Lv, Yuhang Li, Yang Gao, Xiaolu Zhang, Jun Zhou Nautilus: Locality-Aware Autoencoder for Scalable Mesh Generation
Yuxuan Wang, Xuanyu Yi, Haohan Weng, Qingshan Xu, Xiaokang Wei, Xianghui Yang, Chunchao Guo, Long Chen, Hanwang Zhang NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning
Zhixi Cai, Fucai Ke, Simindokht Jahangard, Maria Garcia de la Banda, Reza Haffari, Peter J. Stuckey, Hamid Rezatofighi Neighboring Autoregressive Modeling for Efficient Visual Generation
Yefei He, Yuanyu He, Shaoxuan He, Feng Chen, Hong Zhou, Kaipeng Zhang, Bohan Zhuang NeRF Is a Valuable Assistant for 3D Gaussian Splatting
Shuangkang Fang, I-Chao Shen, Takeo Igarashi, Yufeng Wang, ZeSheng Wang, Yi Yang, Wenrui Ding, Shuchang Zhou NeuFrameQ: Neural Frame Fields for Scalable and Generalizable Anisotropic Quadrangulation
Ying-Tian Liu, Jiajun Li, Yu-Tao Liu, Xin Yu, Yuan-Chen Guo, Yan-Pei Cao, Ding Liang, Ariel Shamir, Song-Hai Zhang Neural Compression for 3D Geometry Sets
Siyu Ren, Junhui Hou, Weiyao Lin, Wenping Wang Neural Shell Texture Splatting: More Details and Fewer Primitives
Xin Zhang, Anpei Chen, Jincheng Xiong, Pinxuan Dai, Yujun Shen, Weiwei Xu NeuralSVG: An Implicit Representation for Text-to-Vector Generation
Sagi Polaczek, Yuval Alaluf, Elad Richardson, Yael Vinker, Daniel Cohen-Or Neuroverse3D: Developing In-Context Learning Universal Model for Neuroimaging in 3D
Jiesi Hu, Hanyang Peng, Yanwu Yang, Xutao Guo, Yang Shang, Pengcheng Shi, Chenfei Ye, Ting Ma NGD: Neural Gradient Based Deformation for Monocular Garment Reconstruction
Soham Dasgupta, Shanthika Naik, Preet Savalia, Sujay Kumar Ingle, Avinash Sharma Noise2Score3D: Tweedie's Approach for Unsupervised Point Cloud Denoising
Xiangbin Wei, Yuanfeng Wang, Ao Xu, Lingyu Zhu, Dongyong Sun, Keren Li, Yang Li, Qi Qin NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping
Tianyi Wang, Shuaicheng Niu, Harry Cheng, Xiao Zhang, Yinglong Wang O-MaMa: Learning Object Mask Matching Between Egocentric and Exocentric Views
Lorenzo Mur-Labadia, Maria Santos-Villafranca, Jesus Bermudez-Cameo, Alejandro Perez-Yus, Ruben Martinez-Cantin, Jose J. Guerrero Object-Centric Video Question Answering with Visual Grounding and Referring
Haochen Wang, Qirui Chen, Cilin Yan, Jiayin Cai, Xiaolong Jiang, Yao Hu, Weidi Xie, Stratis Gavves Object-Level Correlation for Few-Shot Segmentation
Chunlin Wen, Yu Zhang, Jie Fan, Hongyuan Zhu, Xiu-Shen Wei, Yijun Wang, Zhiqiang Kou, Shuzhou Sun ObjectGS: Object-Aware Scene Reconstruction and Scene Understanding via Gaussian Splatting
Ruijie Zhu, Mulin Yu, Linning Xu, Lihan Jiang, Yixuan Li, Tianzhu Zhang, Jiangmiao Pang, Bo Dai ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation
Daniel Winter, Asaf Shul, Matan Cohen, Dana Berman, Yael Pritch, Alex Rav-Acha, Yedid Hoshen ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives
Yuqian Fu, Runze Wang, Bin Ren, Guolei Sun, Biao Gong, Yanwei Fu, Danda Pani Paudel, Xuanjing Huang, Luc Van Gool OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering
Shiyong Liu, Xiao Tang, Zhihao Li, Yingfan He, Chongjie Ye, Jianzhuang Liu, Binxiao Huang, Shunbo Zhou, Xiaofei Wu Occlusion-Robust Stylization for Drawing-Based 3D Animation
Sunjae Yoon, Gwanhyeong Koo, Younghwan Lee, Ji Woo Hong, Chang D. Yoo Occupancy Learning with Spatiotemporal Memory
Ziyang Leng, Jiawei Yang, Wenlong Yi, Bolei Zhou OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
Junyuan Zhang, Qintong Zhang, Bin Wang, Linke Ouyang, Zichen Wen, Ying Li, Ka-Ho Chow, Conghui He, Wentao Zhang OD-RASE: Ontology-Driven Risk Assessment and Safety Enhancement for Autonomous Driving
Kota Shimomura, Masaki Nambata, Atsuya Ishikawa, Ryota Mimura, Koki Inoue, Takayoshi Yamashita, Takayuki Kawabuchi ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches
Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif, Bassem Ouni, Muhammad Shafique OmniHuman-1: Rethinking the Scaling-up of One-Stage Conditioned Human Animation Models
Gaojie Lin, Jianwen Jiang, Jiaqi Yang, Zerong Zheng, Chao Liang, Yuan Zhang, Jingtuo Liu OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation
Ding Zhong, Xu Zheng, Chenfei Liao, Yuanhuiyi Lyu, Jialei Chen, Shengyang Wu, Linfeng Zhang, Xuming Hu OmniVTON: Training-Free Universal Virtual Try-on
Zhaotong Yang, Yuhui Li, Shengfeng He, Xinzhe Li, Yangyang Xu, Junyu Dong, Yong Du On Large Multimodal Models as Open-World Image Classifiers
Alessandro Conti, Massimiliano Mancini, Enrico Fini, Yiming Wang, Paolo Rota, Elisa Ricci On the Generalization of Representation Uncertainty in Earth Observation
Spyros Kondylatos, Nikolaos Ioannis Bountos, Dimitrios Michail, Xiao Xiang Zhu, Gustau Camps-Valls, Ioannis Papoutsis On the Robustness Tradeoff in Fine-Tuning
Kunyang Li, Jean-Charles Noirot Ferrand, Ryan Sheatsley, Blaine Hoak, Yohan Beugin, Eric Pauley, Patrick McDaniel One Last Attention for Your Vision-Language Model
Liang Chen, Ghazi Shazan Ahmad, Tianjun Yao, Lingqiao Liu, Zhiqiang Shen One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-Object Trajectory
Chenhao Zheng, Jieyu Zhang, Mohammadreza Salehi, Ziqi Gao, Vishnu Iyengar, Norimasa Kobori, Quan Kong, Ranjay Krishna Online Generic Event Boundary Detection
Hyungrok Jung, Daneul Kim, Seunggyun Lim, Jeany Son, Jonghyun Choi Online Language Splatting
Saimouli Katragadda, Cho-Ying Wu, Yuliang Guo, Xinyu Huang, Guoquan Huang, Liu Ren Online Reasoning Video Segmentation with Just-in-Time Digital Twins
Yiqing Shen, Bohan Liu, Chenjia Li, Lalithkumar Seenivasan, Mathias Unberath ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
Zifu Wan, Ce Zhang, Silong Yong, Martin Q. Ma, Simon Stepputtis, Louis-Philippe Morency, Deva Ramanan, Katia Sycara, Yaqi Xie Open-Set Cross Modal Generalization via Multimodal Unified Representation
Hai Huang, Yan Xia, Shulei Wang, Hanting Wang, Minghui Fang, Shengpeng Ji, Sashuai Zhou, Tao Jin, Zhou Zhao Open-Vocabulary Octree-Graph for 3D Scene Understanding
Zhigang Wang, Yifei Su, Chenhui Li, Dong Wang, Yan Huang, Xuelong Li, Bin Zhao OpenM3D: Open Vocabulary Multi-View Indoor 3D Object Detection Without Human Annotations
Peng-Hao Hsu, Ke Zhang, Fu-En Wang, Tao Tu, Ming-Feng Li, Yu-Lun Liu, Albert Y. C. Chen, Min Sun, Cheng-Hao Kuo OpenRSD: Towards Open-Prompts for Object Detection in Remote Sensing Images
Ziyue Huang, Yongchao Feng, Ziqi Liu, Shuai Yang, Qingjie Liu, Yunhong Wang OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu, Kun Yuan, Yaling Shen, Feilong Tang, Xiaohao Xu, Lin Zhou, Wei Li, Ying Chen, Zhongxing Xu, Zelin Peng, Siyuan Yan, Vinkle Srivastav, Diping Song, Tianbin Li, Danli Shi, Jin Ye, Nicolas Padoy, Nassir Navab, Junjun He, Zongyuan Ge OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography
Caoshuo Li, Zengmao Ding, Xiaobin Hu, Bang Li, Donghao Luo, AndyPian Wu, Chaoyang Wang, Chengjie Wang, Taisong Jin, Seven Shu, Yunsheng Wu, Yongge Liu, Rongrong Ji OrderChain: Towards General Instruct-Tuning for Stimulating the Ordinal Understanding Ability of MLLM
Jinhong Wang, Shuo Tong, Jian Liu, Dongqi Tang, Weiqiang Wang, Wentong Li, Hongxia Xu, Danny Z. Chen, Jintai Chen, Jian Wu ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Dingkang Liang, Chong Zhang, Dingyuan Zhang, Hongwei Xie, Bing Wang, Xiang Bai OURO: A Self-Bootstrapped Framework for Enhancing Multimodal Scene Understanding
Tianrun Xu, Guanyu Chen, Ye Li, Yuxin Xi, Zeyu Mu, Ruichen Wang, Tianren Zhang, Haichuan Gao, Feng Chen Ouroboros: Single-Step Diffusion Models for Cycle-Consistent Forward and Inverse Rendering
Shanlin Sun, Yifan Wang, Hanwen Zhang, Yifeng Xiong, Qin Ren, Ruogu Fang, Xiaohui Xie, Chenyu You OuroMamba: A Data-Free Quantization Framework for Vision Mamba
Akshat Ramachandran, Mingyu Lee, Huan Xu, Souvik Kundu, Tushar Krishna OVG-HQ: Online Video Grounding with Hybrid-Modal Queries
Runhao Zeng, Jiaqi Mao, Minghao Lai, Minh Hieu Phan, Yanjie Dong, Wei Wang, Qi Chen, Xiping Hu P-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
Jun Zhang, Desen Meng, Zhengming Zhang, Zhenpeng Huang, Tao Wu, Limin Wang PanSt3R: Multi-View Consistent Panoptic Segmentation
Lojze Zust, Yohann Cabon, Juliette Marrie, Leonid Antsfeld, Boris Chidlovskii, Jerome Revaud, Gabriela Csurka Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models
Haoming Cai, Tsung-Wei Huang, Shiv Gehlot, Brandon Y. Feng, Sachin Shah, Guan-Ming Su, Christopher Metzler PartField: Learning 3D Feature Fields for Part Segmentation and Beyond
Minghua Liu, Mikaela Angelina Uy, Donglai Xiang, Hao Su, Sanja Fidler, Nicholas Sharp, Jun Gao Passing the Driving Knowledge Test
Maolin Wei, Wanzhou Liu, Eshed Ohn-Bar PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior
Seunggwan Lee, Hwanhee Jung, Byoungsoo Koh, Qixing Huang, Sang Ho Yoon, Sangpil Kim PatchScaler: An Efficient Patch-Independent Diffusion Model for Image Super-Resolution
Yong Liu, Hang Dong, Jinshan Pan, Qingji Dong, Kai Chen, Rongxiang Zhang, Lean Fu, Fei Wang PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions
Mahesh Bhosale, Abdul Wasi, Yuanhao Zhai, Yunjie Tian, Samuel Border, Nan Xi, Pinaki Sarder, Junsong Yuan, David Doermann, Xuan Gong PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology
Fatemeh Ghezloo, Mehmet Saygin Seyfioglu, Rustin Soraki, Wisdom O. Ikezogwo, Beibin Li, Tejoram Vivekanandan, Joann G. Elmore, Ranjay Krishna, Linda Shapiro Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
Liang Xu, Chengqun Yang, Zili Lin, Fei Xu, Yifan Liu, Congsheng Xu, Yiyi Zhang, Jie Qin, Xingdong Sheng, Yunhui Liu, Xin Jin, Yichao Yan, Wenjun Zeng, Xiaokang Yang Performing Defocus Deblurring by Modeling Its Formation Process
Zhengbo Zhang, Lin Geng Foo, Hossein Rahmani, Jun Liu, De Wen Soh PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Model
Jinhua Zhang, Hualian Sheng, Sijia Cai, Bing Deng, Qiao Liang, Wen Li, Ying Fu, Jieping Ye, Shuhang Gu Personalized Federated Learning Under Local Supervision
Qiqi Liu, Jiaqiang Li, Yuchen Liu, Yaochu Jin, Lingjuan Lyu, Xiaohu Wu, Han Yu PersonalVideo: High ID-Fidelity Video Customization Without Dynamic and Semantic Degradation
Hengjia Li, Haonan Qiu, Shiwei Zhang, Xiang Wang, Yujie Wei, Zekun Li, Yingya Zhang, Boxi Wu, Deng Cai Perspective-Aware 3D Gaussian Inpainting with Multi-View Consistency
Yuxin Cheng, Binxiao Huang, Taiqiang Wu, Wenyong Zhou, Chenchen Ding, Zhengwu Liu, Graziano Chesi, Ngai Wong Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Phillip Y. Lee, Jihyeon Je, Chanho Park, Mikaela Angelina Uy, Leonidas Guibas, Minhyuk Sung Perspective-Aware Teaching: Adapting Knowledge for Heterogeneous Distillation
Jhe-Hao Lin, Yi Yao, Chan-Feng Hsu, Hong-Xia Xie, Hong-Han Shuai, Wen-Huang Cheng Perspective-Invariant 3D Object Detection
Ao Liang, Lingdong Kong, Dongyue Lu, Youquan Liu, Jian Fang, Huaici Zhao, Wei Tsang Ooi Ph-GAN: Physics-Inspired GAN for Generating SAR Images Under Limited Data
Xidan Zhang, Yihan Zhuang, Qian Guo, Haodong Yang, Xuelin Qian, Gong Cheng, Junwei Han, Zhongling Huang Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment
Lijie Liu, Tianxiang Ma, Bingchuan Li, Zhuowei Chen, Jiawei Liu, Gen Li, Siyu Zhou, Qian He, Xinglong Wu PHD: Personalized 3D Human Body Fitting with Point Diffusion
Hsuan-I Ho, Chen Guo, Po-Chen Wu, Ivan Shugurov, Chengcheng Tang, Abhay Mittal, Sizhe An, Manuel Kaufmann, Linguang Zhang Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models
Vahid Balazadeh, Mohammadmehdi Ataei, Hyunmin Cheong, Amir Hosein Khasahmadi, Rahul G. Krishnan Pinco: Position-Induced Consistent Adapter for Diffusion Transformer in Foreground-Conditioned Inpainting
Guangben Lu, Yuzhen Du, Yizhe Tang, Zhimin Sun, Ran Yi, Yifan Qi, Tianyi Wang, Lizhuang Ma, Fangyuan Zou PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
Ahmed Abdelreheem, Filippo Aleotti, Jamie Watson, Zawar Qureshi, Abdelrahman Eldesokey, Peter Wonka, Gabriel Brostow, Sara Vicente, Guillermo Garcia-Hernando PlaneRAS: Learning Planar Primitives for 3D Plane Recovery
Fang Zhang, Wenzhao Zheng, Linqing Zhao, Zelan Zhu, Jiwen Lu, Xiuzhuang Zhou PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models
Runze He, Bo Cheng, Yuhang Ma, Qingxiang Jia, Shanyuan Liu, Ao Ma, Xiaoyu Wu, Liebucha Wu, Dawei Leng, Yuhui Yin Plug-in Feedback Self-Adaptive Attention in CLIP for Training-Free Open-Vocabulary Segmentation
Zhixiang Chi, Yanan Wu, Li Gu, Huan Liu, Ziqiang Wang, Yang Zhang, Yang Wang, Konstantinos Plataniotis PlugMark: A Plug-in Zero-Watermarking Framework for Diffusion Models
Pengzhen Chen, Yanwei Liu, Xiaoyan Gu, Enci Liu, Zhuoyi Shang, Xiangyang Ji, Wu Liu Point Cloud Self-Supervised Learning via 3D to Multi-View Masked Learner
Zhimin Chen, Xuewei Chen, Xiao Guo, Yingwei Li, Longlong Jing, Liang Yang, Bing Li PointGAC: Geometric-Aware Codebook for Masked Point Modeling
Abiao Li, Chenlei Lv, Yuming Fang, Yifan Zuo, Jian Zhang, Guofeng Mei PolarAnything: Diffusion-Based Polarimetric Image Synthesis
Kailong Zhang, Youwei Lyu, Heng Guo, Si Li, Zhanyu Ma, Boxin Shi PolGS: Polarimetric Gaussian Splatting for Fast Reflective Surface Reconstruction
Yufei Han, Bowen Tie, Heng Guo, Youwei Lyu, Si Li, Boxin Shi, Yunpeng Jia, Zhanyu Ma POMATO: Marrying Pointmap Matching with Temporal Motions for Dynamic 3D Reconstruction
Songyan Zhang, Yongtao Ge, Jinyuan Tian, Guangkai Xu, Hao Chen, Chen Lv, Chunhua Shen PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data
ChangHee Yang, Hyeonseop Song, Seokhun Choi, Seungwoo Lee, Jaechul Kim, Hoseok Do Preacher: Paper-to-Video Agentic System
Jingwei Liu, Ling Yang, Hao Luo, Fan Wang, Hongyan Li, Mengdi Wang Precise Action-to-Video Generation Through Visual Action Prompts
Yuang Wang, Chao Wen, Haoyu Guo, Sida Peng, Minghan Qin, Hujun Bao, Xiaowei Zhou, Ruizhen Hu Predict-Optimize-Distill: A Self-Improving Cycle for 4D Object Understanding
Mingxuan Wu, Huang Huang, Justin Kerr, Chung Min Kim, Anthony Zhang, Brent Yi, Angjoo Kanazawa Preserve Anything: Controllable Image Synthesis with Object Preservation
Prasen Kumar Sharma, Neeraj Matiyali, Siddharth Srivastava, Gaurav Sharma Pretrained Reversible Generation as Unsupervised Visual Representation Learning
Rongkun Xue, Jinouwen Zhang, Yazhe Niu, Dazhong Shen, Bingqi Ma, Yu Liu, Jing Yang PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning
Yan Zhang, Yao Feng, Alpár Cseke, Nitin Saini, Nathan Bajandas, Nicolas Heron, Michael J. Black Princeton365: A Diverse Dataset with Accurate Camera Pose
Karhan Kayan, Stamatis Alexandropoulos, Rishabh Jain, Yiming Zuo, Erich Liang, Jia Deng Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao, Gen Li, Shreyank N Gowda, Robert B. Fisher, Jonathan Huang, Anurag Arnab, Laura Sevilla-Lara Prior-Aware Dynamic Temporal Modeling Framework for Sequential 3D Hand Pose Estimation
Pengfei Ren, Jingyu Wang, Haifeng Sun, Qi Qi, Xingyu Liu, Menghao Zhang, Lei Zhang, Jing Wang, Jianxin Liao Prior2Former - Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation
Sebastian Schmidt, Julius Koerner, Dominik Fuchsgruber, Stefano Gasperini, Federico Tombari, Stephan Günnemann PriorMotion: Generative Class-Agnostic Motion Prediction with Raster-Vector Motion Field Priors
Kangan Qian, Jinyu Miao, Xinyu Jiao, Ziang Luo, Zheng Fu, Yining Shi, Yunlong Wang, Kun Jiang, Diange Yang PRM: Photometric Stereo Based Large Reconstruction Model
Wenhang Ge, Jiantao Lin, Guibao Shen, Jiawei Feng, Tao Hu, Xinli Xu, Ying-Cong Chen PRO-VPT: Distribution-Adaptive Visual Prompt Tuning via Prompt Relocation
Chikai Shang, Mengke Li, Yiqun Zhang, Zhen Chen, Jinlin Wu, Fangqing Gu, Yang Lu, Yiu-Ming Cheung Processing and Acquisition Traces in Visual Encoders: What Does CLIP Know About Your Camera?
Ryan Ramos, Vladan Stojnić, Giorgos Kordopatis-Zilos, Yuta Nakashima, Giorgos Tolias, Noa Garcia ProGait: A Multi-Purpose Video Dataset and Benchmark for Transfemoral Prosthesis Users
Xiangyu Yin, Boyuan Yang, Weichen Liu, Qiyao Xue, Abrar Alamri, Goeran Fiedler, Wei Gao Progressive Test Time Energy Adaptation for Medical Image Segmentation
Xiaoran Zhang, Byung-Woo Hong, Hyoungseob Park, Daniel H. Pak, Anne-Marie Rickmann, Lawrence H. Staib, James S. Duncan, Alex Wong PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement
Tewodros W. Ayalew, Xiao Zhang, Kevin Yuanbo Wu, Tianchong Jiang, Michael Maire, Matthew R. Walter ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-Based Process Judges
Jiaxin Ai, Pengfei Zhou, Zhaopan Xu, Ming Li, Fanrui Zhang, Zizhen Li, Jianwen Sun, Yukang Feng, Baojin Huang, Zhongyuan Wang, Kaipeng Zhang PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning
M. Anwar Ma'sum, Mahardhika Pratama, Savitha Ramasamy, Lin Liu, Habibullah Habibullah, Ryszard Kowalczyk Prompt-a-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM
Yatai Ji, Jiacheng Zhang, Jie Wu, Shilong Zhang, Shoufa Chen, Chongjian Ge, Peize Sun, Weifeng Chen, Wenqi Shao, Xuefeng Xiao, Weilin Huang, Ping Luo PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination
Ming Dai, Wenxuan Cheng, Jiedong Zhuang, Jiang-jiang Liu, Hongshen Zhao, Zhenhua Feng, Wankou Yang Prototype Guided Backdoor Defense via Activation Space Manipulation
Venkat Adithya Amula, Sunayana Samavedam, Saurabh Saini, Avani Gupta, P J Narayanan Prototypes Are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
WonJun Moon, Cheol-Ho Cho, Woojin Jun, Taeoh Kim, Inwoong Lee, Dongyoon Wee, Minho Shim, Jae-Pil Heo Proxy-Bridged Game Transformer for Interactive Extreme Motion Prediction
Yanwen Fang, Wenqi Jia, Xu Cao, Peng-Tao Jiang, Guodong Li, Jintai Chen PseudoMapTrainer: Learning Online Mapping Without HD Maps
Christian Löwens, Thorben Funke, Jingchao Xie, Alexandru Paul Condurache PUMA: Empowering Unified MLLM with Multi-Granular Visual Generation
Rongyao Fang, Chengqi Duan, Kun Wang, Hao Li, Linjiang Huang, Hao Tian, Xingyu Zeng, Rui Zhao, Jifeng Dai, Hongsheng Li, Xihui Liu Punching Bag vs. Punching Person: Motion Transferability in Videos
Raiyaan Abdullah, Jared Claypoole, Michael Cogswell, Ajay Divakaran, Yogesh Rawat Purge-Gate: Backpropagation-Free Test-Time Adaptation for Point Clouds Classification via Token Purging
Moslem Yazdanpanah, Ali Bahri, Mehrdad Noori, Sahar Dastani, Gustavo Adolfo Vargas Hakim, David Osowiechi, Ismail Ben Ayed, Christian Desrosiers PVChat: Personalized Video Chat with One-Shot Learning
Yufei Shi, Weilong Yan, Gang Xu, Yumeng Li, Yucheng Chen, Zhenxi Li, Fei Yu, Ming Li, Si Yong Yeo Q-Norm: Robust Representation Learning via Quality-Adaptive Normalization
Lanning Zhang, Ying Zhou, Fei Gao, Ziyun Li, Maoying Qiao, Jinlan Xu, Nannan Wang QK-Edit: Revisiting Attention-Based Injection in MM-DiT for Image and Video Editing
Tiancheng Shen, Zilong Huang, Xiangtai Li, Zhijie Lin, Jiyang Liu, Yitong Wang, Jiashi Feng, Ming-Hsuan Yang, Jun Hao Liew QR-LoRA: Efficient and Disentangled Fine-Tuning via QR Decomposition for Customized Generation
Jiahui Yang, Yongjia Ma, Donglin Di, Jianxun Cui, Hao Li, Wei Chen, Yan Xie, Xun Yang, Wangmeng Zuo Quanta Neural Networks: From Photons to Perception
Varun Sundar, Tianyi Zhang, Sacha Jungerman, Mohit Gupta R1-Onevision: Advancing Generalized Multimodal Reasoning Through Cross-Modal Formalization
Yi Yang, Xiaoxuan He, Hongkun Pan, Xiyan Jiang, Yan Deng, Xingtao Yang, Haoyu Lu, Dacheng Yin, Fengyun Rao, Minfeng Zhu, Bo Zhang, Wei Chen RadGPT: Constructing 3D Image-Text Tumor Datasets
Pedro R.A.S. Bassi, Mehmet Can Yavuz, Ibrahim Ethem Hamamci, Sezgin Er, Xiaoxi Chen, Wenxuan Li, Bjoern Menze, Sergio Decherchi, Andrea Cavalli, Kang Wang, Yang Yang, Alan Yuille, Zongwei Zhou Radiant Foam: Real-Time Differentiable Ray Tracing
Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi RAGD: Regional-Aware Diffusion Model for Text-to-Image Generation
Zhennan Chen, Yajie Li, Haofan Wang, Zhibo Chen, Zhengkai Jiang, Jun Li, Qian Wang, Jian Yang, Ying Tai RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation
Yuhan Li, Xianfeng Tan, Wenxiang Shang, Yubo Wu, Jian Wang, Xuanhong Chen, Yi Zhang, Hangcheng Zhu, Bingbing Ni RAGNet: Large-Scale Reasoning-Based Affordance Segmentation Benchmark Towards General Grasping
Dongming Wu, Yanping Fu, Saike Huang, Yingfei Liu, Fan Jia, Nian Liu, Feng Dai, Tiancai Wang, Rao Muhammad Anwer, Fahad Shahbaz Khan, Jianbing Shen RALoc: Enhancing Outdoor LiDAR Localization via Rotation Awareness
Yuyang Yang, Wen Li, Sheng Ao, Qingshan Xu, Shangshu Yu, Yu Guo, Yin Zhou, Siqi Shen, Cheng Wang Randomized Autoregressive Visual Generation
Qihang Yu, Ju He, Xueqing Deng, Xiaohui Shen, Liang-Chieh Chen RANKCLIP: Ranking-Consistent Language-Image Pretraining
Yiming Zhang, Zhuokai Zhao, Zhaorun Chen, Zhili Feng, Zenghui Ding, Yining Sun RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text
Jiaben Chen, Xin Yan, Yihang Chen, Siyuan Cen, Zixin Wang, Qinwei Ma, Haoyu Zhen, Kaizhi Qian, Lie Lu, Chuang Gan RayPose: Ray Bundling Diffusion for Template Views in Unseen 6d Object Pose Estimation
Junwen Huang, Shishir Reddy Vutukur, Peter KT Yu, Nassir Navab, Slobodan Ilic, Benjamin Busam RayZer: A Self-Supervised Large View Synthesis Model
Hanwen Jiang, Hao Tan, Peng Wang, Haian Jin, Yue Zhao, Sai Bi, Kai Zhang, Fujun Luan, Kalyan Sunkavalli, Qixing Huang, Georgios Pavlakos RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control
Teng Li, Guangcong Zheng, Rui Jiang, Shuigen Zhan, Tao Wu, Yehao Lu, Yining Lin, Chuanyun Deng, Yepan Xiong, Min Chen, Lin Cheng, Xi Li ReassembleNet: Learnable Keypoints and Diffusion for 2D Fresco Reconstruction
Adeela Islam, Stefano Fiorini, Stuart James, Pietro Morerio, Alessio Del Bue ReCamMaster: Camera-Controlled Generative Rendering from a Single Video
Jianhong Bai, Menghan Xia, Xiao Fu, Xintao Wang, Lianrui Mu, Jinwen Cao, Zuozhu Liu, Haoji Hu, Xiang Bai, Pengfei Wan, Di Zhang Recognizing Actions from Robotic View for Natural Human-Robot Interaction
Ziyi Wang, Peiming Li, Hong Liu, Zhichao Deng, Can Wang, Jun Liu, Junsong Yuan, Mengyuan Liu Recovering Parametric Scenes from Very Few Time-of-Flight Pixels
Carter Sifferman, Yiquan Li, Yiming Li, Fangzhou Mu, Michael Gleicher, Mohit Gupta, Yin Li REDUCIO! Generating 1k Video Within 16 Seconds Using Extremely Compressed Motion Latents
Rui Tian, Qi Dai, Jianmin Bao, Kai Qiu, Yifan Yang, Chong Luo, Zuxuan Wu, Yu-Gang Jiang Refer to Any Segmentation Mask Group with Vision-Language Prompts
Shengcao Cao, Zijun Wei, Jason Kuen, Kangning Liu, Lingzhi Zhang, Jiuxiang Gu, HyunJoon Jung, Liang-Yan Gui, Yu-Xiong Wang ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
Tianming Liang, Kun-Yu Lin, Chaolei Tan, Jianguo Zhang, Wei-Shi Zheng, Jian-Fang Hu ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang, Pavel Tokmakov, Martial Hebert Referring Expression Comprehension for Small Objects
Kanoko Goto, Takumi Hirose, Mahiro Ukai, Shuhei Kurita, Nakamasa Inoue Referring to Any Person
Qing Jiang, Lin Wu, Zhaoyang Zeng, Tianhe Ren, Yuda Xiong, Yihao Chen, Liu Qin, Lei Zhang Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Shufan Li, Konstantinos Kallidromitis, Akash Gokul, Arsh Koneru, Yusuke Kato, Kazuki Kozuka, Aditya Grover REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder
Yitian Zhang, Long Mai, Aniruddha Mahapatra, David Bourgin, Yicong Hong, Jonah Casebeer, Feng Liu, Yun Fu Region-Based Cluster Discrimination for Visual Representation Learning
Yin Xie, Kaicheng Yang, Xiang An, Kun Wu, Yongle Zhao, Weimo Deng, Zimin Ran, Yumeng Wang, Ziyong Feng, Roy Miles, Ismail Elezi, Jiankang Deng Reminiscence Attack on Residuals: Exploiting Approximate Machine Unlearning for Privacy
Yaxin Xiao, Qingqing Ye, Li Hu, Huadi Zheng, Haibo Hu, Zi Liang, Haoyang Li, Yijie Jiao Removing Cost Volumes from Optical Flow Estimators
Simon Kiefhaber, Stefan Roth, Simone Schaub-Meyer REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
Xingjian Leng, Jaskirat Singh, Yunzhong Hou, Zhenchang Xing, Saining Xie, Liang Zheng REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment
Haonan Han, Rui Yang, Huan Liao, Jiankai Xing, Zunnan Xu, Xiaoming Yu, Junwei Zha, Xiu Li, Wanhua Li RePoseD: Efficient Relative Pose Estimation with Known Depth Information
Yaqing Ding, Viktor Kocur, Václav Vávra, Zuzana Berger Haladová, Jian Yang, Torsten Sattler, Zuzana Kukelova Representation Shift: Unifying Token Compression with FlashAttention
Joonmyung Choi, Sanghyeok Lee, Byungoh Ko, Eunseo Kim, Jihyung Kil, Hyunwoo J. Kim Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation
Tiange Xiang, Kai Li, Chengjiang Long, Christian Häne, Peihong Guo, Scott Delp, Ehsan Adeli, Li Fei-Fei RESCUE: Crowd Evacuation Simulation via Controlling SDM-United Characters
Xiaolin Liu, Tianyi Zhou, Hongbo Kang, Jian Ma, Ziwen Wang, Jing Huang, Wenguo Weng, Yu-Kun Lai, Kun Li ResidualViT for Efficient Temporally Dense Video Encoding
Mattia Soldan, Fabian Caba Heilbron, Bernard Ghanem, Josef Sivic, Bryan Russell Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework
Jian-Jian Jiang, Xiao-Ming Wu, Yi-Xiang He, Ling-An Zeng, Yi-Lin Wei, Dandan Zhang, Wei-Shi Zheng Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Zhengyao Lv, Tianlin Pan, Chenyang Si, Zhaoxi Chen, Wangmeng Zuo, Ziwei Liu, Kwan-Yee K. Wong Rethinking Detecting Salient and Camouflaged Objects in Unconstrained Scenes
Zhangjun Zhou, Yiping Li, Chunlin Zhong, Jianuo Huang, Jialun Pei, Hua Li, He Tang Rethinking Layered Graphic Design Generation with a Top-Down Approach
Jingye Chen, Zhaowen Wang, Nanxuan Zhao, Li Zhang, Difan Liu, Jimei Yang, Qifeng Chen ReTracker: Exploring Image Matching for Robust Online Any Point Tracking
Dongli Tan, Xingyi He, Sida Peng, Yiqing Gong, Xing Zhu, Jiaming Sun, Ruizhen Hu, Yujun Shen, Hujun Bao, Xiaowei Zhou Reusing Computation in Text-to-Image Diffusion for Efficient Generation of Image Sets
Dale Decatur, Thibault Groueix, Wang Yifan, Rana Hanocka, Vladimir Kim, Matheus Gadelha Reverse Convolution and Its Applications to Image Restoration
Xuhong Huang, Shiqi Liu, Kai Zhang, Ying Tai, Jian Yang, Hui Zeng, Lei Zhang Revisiting Adversarial Patch Defenses on Object Detectors: Unified Evaluation, Large-Scale Dataset, and New Insights
Junhao Zheng, Jiahao Sun, Chenhao Lin, Zhengyu Zhao, Chen Ma, Chong Zhang, Cong Wang, Qian Wang, Chao Shen Revisiting Image Fusion for Multi-Illuminant White-Balance Correction
David Serrano-Lozano, Aditya Arora, Luis Herranz, Konstantinos G. Derpanis, Michael S. Brown, Javier Vazquez-Corral Revisiting Point Cloud Completion: Are We Ready for the Real-World?
Stuti Pathak, Prashant Kumar, Dheeraj Baiju, Nicholus Mboga, Gunther Steenackers, Rudi Penne RI3D: Few-Shot Gaussian Splatting with Repair and Inpainting Diffusion Priors
Avinash Paliwal, Xilong Zhou, Wei Ye, Jinhui Xiong, Rakesh Ranjan, Nima Khademi Kalantari RnGCam: High-Speed Video from Rolling & Global Shutter Measurements
Kevin Tandi, Xiang Dai, Chinmay Talegaonkar, Gal Mishne, Nick Antipa ROADWork: A Dataset and Benchmark for Learning to Recognize, Observe, Analyze and Drive Through Work Zones
Anurag Ghosh, Shen Zheng, Robert Tamburo, Khiem Vuong, Juan Alvarez-Padilla, Hailiang Zhu, Michael Cardei, Nicholas Dunn, Christoph Mertz, Srinivasa G. Narasimhan RobAVA: A Large-Scale Dataset and Baseline Towards Video Based Robotic Arm Action Understanding
Baoli Sun, Ning Wang, Xinzhu Ma, Anqi Zou, Yihang Lu, Chuixuan Fan, Zhihui Wang, Kun Lu, Zhiyong Wang RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
Yiran Qin, Li Kang, Xiufeng Song, Zhenfei Yin, Xiaohong Liu, Xihui Liu, Ruimao Zhang, Lei Bai RoboPearls: Editable Video Simulation for Robot Manipulation
Tang Tao, Likui Zhang, Youpeng Wen, Kaidong Zhang, Jia-Wang Bian, Xia Zhou, Tianyi Yan, Kun Zhan, Peng Jia, Hefeng Wu, Liang Lin, Xiaodan Liang RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving
Zhijian Huang, Chengjian Feng, Feng Yan, Baihui Xiao, Zequn Jie, Yujie Zhong, Xiaodan Liang, Lin Ma RoboTron-Mani: All-in-One Multimodal Large Model for Robotic Manipulation
Feng Yan, Fanfan Liu, Yiyang Huang, Zechao Guan, Liming Zheng, Yufeng Zhong, Chengjian Feng, Lin Ma RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case
Baihui Xiao, Chengjian Feng, Zhijian Huang, Feng Yan, Yujie Zhong, Lin Ma Robust 3D Object Detection Using Probabilistic Point Clouds from Single-Photon LiDARs
Bhavya Goyal, Felipe Gutierrez-Barragan, Wei Lin, Andreas Velten, Yin Li, Mohit Gupta Robust Low-Light Scene Restoration via Illumination Transition
Ze Li, Feng Zhang, Xiatian Zhu, Meng Zhang, Yanghong Zhou, P. Y. Mok Robustifying Zero-Shot Vision Language Models by Subspaces Alignment
Junhao Dong, Piotr Koniusz, Liaoyuan Feng, Yifei Zhang, Hao Zhu, Weiming Liu, Xinghua Qu, Yew-Soon Ong RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS
Chuanyu Fu, Yuqi Zhang, Kunbin Yao, Guanying Chen, Yuan Xiong, Chuan Huang, Shuguang Cui, Xiaochun Cao RoCo-Sim: Enhancing Roadside Collaborative Perception Through Foreground Simulation
Yuwen Du, Anning Hu, Zichen Chao, Yifan Lu, Junhao Ge, Genjia Liu, Weitao Wu, Lanjun Wang, Siheng Chen RomanTex: Decoupling 3D-Aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis
Yifei Feng, Mingxin Yang, Shuhui Yang, Sheng Zhang, Jiaao Yu, Zibo Zhao, Yuhong Liu, Jie Jiang, Chunchao Guo RoMo: Robust Motion Segmentation Improves Structure from Motion
Lily Goli, Sara Sabour, Mark Matthews, Marcus A. Brubaker, Dmitry Lagun, Alec Jacobson, David J. Fleet, Saurabh Saxena, Andrea Tagliasacchi Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Haochen Wang, Yucheng Zhao, Tiancai Wang, Haoqiang Fan, Xiangyu Zhang, Zhaoxiang Zhang RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
Huiyang Hu, Peijin Wang, Hanbo Bi, Boyuan Tong, Zhaozhi Wang, Wenhui Diao, Hao Chang, Yingchao Feng, Ziqi Zhang, Yaowei Wang, Qixiang Ye, Kun Fu, Xian Sun RTMap: Real-Time Recursive Mapping with Change Detection and Localization
Yuheng Du, Sheng Yang, Lingxuan Wang, Zhenghua Hou, Chengying Cai, Zhitao Tan, Mingxia Chen, Shi-Sheng Huang, Qiang Li S3R-GS: Streamlining the Pipeline for Large-Scale Street Scene Reconstruction
Guangting Zheng, Jiajun Deng, Xiaomeng Chu, Yu Yuan, Houqiang Li, Yanyong Zhang S4M: Boosting Semi-Supervised Instance Segmentation with SAM
Heeji Yoon, Heeseong Shin, Eunbeen Hong, Hyunwook Choi, Hansang Cho, Daun Jeong, Seungryong Kim SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World
Chen Chen, Zhirui Wang, Taowei Sheng, Yi Jiang, Yundu Li, Peirui Cheng, Luning Zhang, Kaiqiang Chen, Yanfeng Hu, Xue Yang, Xian Sun SAC-GNC: SAmple Consensus for Adaptive Graduated Non-Convexity
Valter Piedade, Chitturi Sidhartha, José Gaspar, Venu Madhav Govindu, Pedro Miraldo SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting
Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos, Panagiotis C. Petrantonakis Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control
Seongmin Park, Hyungmin Kim, Sangwoo Kim, Wonseok Jeon, Juyoung Yang, Byeongwook Jeon, Yoonseon Oh, Jungwook Choi SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
Shuangrui Ding, Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Yuwei Guo, Dahua Lin, Jiaqi Wang SAM4D: Segment Anything in Camera and LiDAR Streams
Jianyun Xu, Song Wang, Ziqian Ni, Chunyong Hu, Sheng Yang, Jianke Zhu, Qiang Li SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Junsong Chen, Shuchen Xue, Yuyang Zhao, Jincheng Yu, Sayak Paul, Junyu Chen, Han Cai, Song Han, Enze Xie SAS: Segment Any 3D Scene with Integrated 2D Priors
Zhuoyuan Li, Jiahao Lu, Jiacheng Deng, Hanzhi Chang, Lifan Wu, Yanzhe Liang, Tianzhu Zhang Scalable Image Tokenization with Index Backpropagation Quantization
Fengyuan Shi, Zhuoyan Luo, Yixiao Ge, Yujiu Yang, Ying Shan, Limin Wang Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata, Sergey Tulyakov, Jian Ren, Anil Kag Scaling 3D Compositional Models for Robust Classification and Pose Estimation
Xiaoding Yuan, Guofeng Zhang, Prakhar Kaushik, Artur Jesslen, Adam Kortylewski, Alan Yuille Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension
Xiyao Wang, Zhengyuan Yang, Linjie Li, Hongjin Lu, Yuancheng Xu, Chung-Ching Lin, Kevin Lin, Furong Huang, Lijuan Wang Scaling Language-Free Visual Representation Learning
David Fan, Shengbang Tong, Jiachen Zhu, Koustuv Sinha, Zhuang Liu, Xinlei Chen, Michael Rabbat, Nicolas Ballas, Yann LeCun, Amir Bar, Saining Xie Scaling Laws for Native Multimodal Models
Mustafa Shukor, Enrico Fini, Victor Guilherme Turrisi da Costa, Matthieu Cord, Joshua Susskind, Alaaeldin El-Nouby Scaling Transformer-Based Novel View Synthesis with Models Token Disentanglement and Synthetic Data
Nithin Gopalakrishnan Nair, Srinivas Kaza, Xuan Luo, Vishal M. Patel, Stephen Lombardi, Jungyeon Park Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data
Qi Chen, Xinze Zhou, Chen Liu, Hao Chen, Wenxuan Li, Zekun Jiang, Ziyan Huang, Yuxuan Zhao, Dexin Yu, Junjun He, Yefeng Zheng, Ling Shao, Alan Yuille, Zongwei Zhou Scene Coordinate Reconstruction Priors
Wenjing Bian, Axel Barroso-Laguna, Tommaso Cavallari, Victor Adrian Prisacariu, Eric Brachmann Scene Graph Guided Generation: Enable Accurate Relations Generation in Text-to-Image Models via Textural Rectification
Guibao Shen, Luozhou Wang, Jiantao Lin, Wenhang Ge, Chaozhe Zhang, Xin Tao, Di Zhang, Pengfei Wan, Guangyong Chen, Yijun Li, Ying-cong Chen SceneSplat: Gaussian Splatting-Based Scene Understanding with Vision-Language Pretraining
Yue Li, Qi Ma, Runyi Yang, Huapeng Li, Mengjiao Ma, Bin Ren, Nikola Popovic, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc Van Gool, Martin R. Oswald, Danda Pani Paudel SCFlow: Implicitly Learning Style and Content Disentanglement with Flow Models
Pingchuan Ma, Xiaopei Yang, Yusong Li, Ming Gui, Felix Krause, Johannes Schusterbauer, Björn Ommer SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications
Yana Hasson, Pauline Luc, Liliane Momeni, Maks Ovsjanikov, Guillaume Le Moing, Alina Kuznetsova, Ira Ktena, Jennifer J. Sun, Skanda Koppula, Dilara Gokay, Joseph Heyward, Etienne Pot, Andrew Zisserman Scoring, Remember, and Reference: Catching Camouflaged Objects in Videos
Yu'ang Feng, Shuyong Gao, Fuzhen Yan, Yicheng Song, Lingyi Hong, Junjie Hu, Wenqiang Zhang SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image
Dimitrije Antić, Georgios Paschalidis, Shashank Tripathi, Theo Gevers, Sai Kumar Dwivedi, Dimitrios Tzionas SDMatte: Grafting Diffusion Models for Interactive Matting
Longfei Huang, Yu Liang, Hao Zhang, Jinwei Chen, Wei Dong, Lunde Chen, Wanyu Liu, Bo Li, Peng-Tao Jiang SEAL: Semantic Aware Image Watermarking
Kasra Arabi, R. Teal Witter, Chinmay Hegde, Niv Cohen Secure On-Device Video OOD Detection Without Backpropagation
Shawn Li, Peilin Cai, Yuxiao Zhou, Zhiyu Ni, Renjie Liang, You Qin, Yi Nian, Zhengzhong Tu, Xiyang Hu, Yue Zhao Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding
Ta Duc Huy, Duy Anh Huynh, Yutong Xie, Yuankai Qi, Qi Chen, Phi Le Nguyen, Sen Kim Tran, Son Lam Phung, Anton van den Hengel, Zhibin Liao, Minh-Son To, Johan W. Verjans, Vu Minh Hieu Phan SEGA: A Stepwise Evolution Paradigm for Content-Aware Layout Generation with Design Prior
Haoran Wang, Bo Zhao, Jinghui Wang, Hanzhang Wang, Huan Yang, Wei Ji, Hao Liu, Xinyan Xiao SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images
Yichi Zhang, Le Xue, Wenbo Zhang, Lanlan Li, Yuchen Liu, Chen Jiang, Yuan Cheng, Yuan Qi Self-Calibrating Gaussian Splatting for Large Field-of-View Reconstruction
Youming Deng, Wenqi Xian, Guandao Yang, Leonidas Guibas, Gordon Wetzstein, Steve Marschner, Paul Debevec Self-Supervised Sparse Sensor Fusion for Long Range Perception
Edoardo Palladin, Samuel Brucker, Filippo Ghilotti, Praveen Narayanan, Mario Bijelic, Felix Heide Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers
Yunshan Zhong, Yuyao Zhou, Yuxin Zhang, Wanchen Sui, Shen Li, Yong Li, Fei Chao, Rongrong Ji Semantic Causality-Aware Vision-Based 3D Occupancy Prediction
Dubing Chen, Huan Zheng, Yucheng Zhou, Xianfei Li, Wenlong Liao, Tao He, Pai Peng, Jianbing Shen Semantic Versus Identity: A Divide-and-Conquer Approach Towards Adjustable Medical Image De-Identification
Yuan Tian, Shuo Wang, Rongzhao Zhang, Zijian Chen, Yankai Jiang, Chunyi Li, Xiangyang Zhu, Fang Yan, Qiang Hu, XiaoSong Wang, Guangtao Zhai Semi-Supervised Concept Bottleneck Models
Lijie Hu, Tianhao Huang, Huanyi Xie, Xilin Gong, Chenyang Ren, Zhengyu Hu, Lu Yu, Ping Ma, Di Wang SemTalk: Holistic Co-Speech Motion Generation with Frame-Level Semantic Emphasis
Xiangyue Zhang, Jianfang Li, Jiaxu Zhang, Ziqiang Dang, Jianqiang Ren, Liefeng Bo, Zhigang Tu SeqGrowGraph: Learning Lane Topology as a Chain of Graph Expansions
Mengwei Xie, Shuang Zeng, Xinyuan Chang, Xinran Liu, Zheng Pan, Mu Xu, Xing Wei SEREP: Semantic Facial Expression Representation for Robust In-the-Wild Capture and Retargeting
Arthur Josi, Luiz Gustavo Hafemann, Abdallah Dib, Emeline Got, Rafael M. O. Cruz, Marc-André Carbonneau Serialization Based Point Cloud Oversegmentation
Chenghui Lu, Jianlong Kwan, Dilong Li, Ziyi Chen, Haiyan Guan SFUOD: Source-Free Unknown Object Detection
Keon-Hee Park, Seun-An Choe, Gyeong-Moon Park SGAD: Semantic and Geometric-Aware Descriptor for Local Feature Matching
Xiangzeng Liu, Chi Wang, Guanglu Shi, Xiaodong Zhang, Qiguang Miao, Miao Fan Shape of Motion: 4D Reconstruction from a Single Video
Qianqian Wang, Vickie Ye, Hang Gao, Weijia Zeng, Jake Austin, Zhengqi Li, Angjoo Kanazawa SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians
Liam Schoneveld, Zhe Chen, Davide Davoli, Jiapeng Tang, Saimon Terazawa, Ko Nishino, Matthias Nießner ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers
Qianhao Yuan, Qingyu Zhang, Yanjiang Liu, Jiawei Chen, Yaojie Lu, Hongyu Lin, Jia Zheng, Xianpei Han, Le Sun Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Eshika Khandelwal, Gül Varol, Weidi Xie, Andrew Zisserman SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets
Yuhang Yang, Fengqi Liu, Yixing Lu, Qin Zhao, Pingyu Wu, Wei Zhai, Ran Yi, Yang Cao, Lizhuang Ma, Zheng-Jun Zha, Junting Dong Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator
Ronglai Zuo, Rolandos Alexandros Potamias, Evangelos Ververas, Jiankang Deng, Stefanos Zafeiriou Sim-DETR: Unlock DETR for Temporal Sentence Grounding
Jiajin Tang, Zhengxuan Wei, Yuchen Zhu, Cheng Shi, Guanbin Li, Liang Lin, Sibei Yang SiM3D: Single-Instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark
Alex Costanzino, Pierluigi Zama Ramirez, Luigi Lella, Matteo Ragaglia, Alessandro Oliva, Giuseppe Lisanti, Luigi Di Stefano SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models
Xianfu Cheng, Wei Zhang, Shiwei Zhang, Jian Yang, Xiangyuan Guan, Xianjie Wu, Xiang Li, Ge Zhang, Jiaheng Liu, Yuying Mai, Yutao Zeng, Zhoufutu Wen, Ke Jin, Baorui Wang, Weixiao Zhou, Yunhong Lu, Hangyuan Ji, Tongliang Li, Wenhao Huang, Zhoujun Li SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation
Wenjia Wang, Liang Pan, Zhiyang Dou, Jidong Mei, Zhouyingcheng Liao, Yuke Lou, Yifan Wu, Lei Yang, Jingbo Wang, Taku Komura SITE: Towards Spatial Intelligence Thorough Evaluation
Wenqi Wang, Reuben Tan, Pengyue Zhu, Jianwei Yang, Zhengyuan Yang, Lijuan Wang, Andrey Kolobov, Jianfeng Gao, Boqing Gong SKALD: Learning-Based Shot Assembly for Coherent Multi-Shot Video Creation
Chen-Yi Lu, Md Mehrab Tanjim, Ishita Dasgupta, Somdeb Sarkhel, Gang Wu, Saayan Mitra, Somali Chaterji SkySense V2: A Unified Foundation Model for Multi-Modal Remote Sensing
Yingying Zhang, Lixiang Ru, Kang Wu, Lei Yu, Lei Liang, Yansheng Li, Jingdong Chen SL2A-INR: Single-Layer Learnable Activation for Implicit Neural Representation
Reza Rezaeian, Moein Heidari, Reza Azad, Dorit Merhof, Hamid Soltanian-Zadeh, Ilker Hacihaliloglu SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
Rohit Gandikota, Zongze Wu, Richard Zhang, David Bau, Eli Shechtman, Nick Kolkin SMGDiff: Soccer Motion Generation Using Diffusion Probabilistic Models
Hongdi Yang, Chengyang Li, Zhenxuan Wu, Gaozheng Li, Jingya Wang, Jingyi Yu, Zhuo Su, Lan Xu SmolDocling: An Ultra-Compact Vision-Language Model for End-to-End Multi-Modal Document Conversion
Ahmed Nassar, Matteo Omenetti, Maksym Lysak, Nikolaos Livathinos, Christoph Auer, Lucas Morin, Rafael Teixeira de Lima, Yusik Kim, A. Said Gurbuz, Michele Dolfi, Peter W. J. Staar SMSTracker: Tri-Path Score Mask Sigma Fusion for Multi-Modal Tracking
Sixian Chan, Zedong Li, Wenhao Li, Shijian Lu, Chunhua Shen, Xiaoqin Zhang Social Debiasing for Fair Multi-Modal LLMs
Harry Cheng, Yangyang Guo, Qingpei Guo, Ming Yang, Tian Gan, Weili Guan, Liqiang Nie Soft Local Completeness: Rethinking Completeness in XAI
Ziv Weiss Haddad, Oren Barkan, Yehonatan Elisha, Noam Koenigstein SP2T: Sparse Proxy Attention for Dual-Stream Point Transformer
Jiaxu Wan, Hong Zhang, Ziqi He, Yangyan Deng, Qishu Wang, Ding Yuan, Yifan Yang Sparfels: Fast Reconstruction from Sparse Unposed Imagery
Shubhendu Jena, Amine Ouasfi, Mae Younes, Adnane Boukhayma SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
Xianglong He, Zi-Xin Zou, Chia-Hao Chen, Yuan-Chen Guo, Ding Liang, Chun Yuan, Wanli Ouyang, Yan-Pei Cao, Yangguang Li SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference
Samir Khaki, Junxian Guo, Jiaming Tang, Shang Yang, Yukang Chen, Konstantinos N. Plataniotis, Yao Lu, Song Han, Zhijian Liu Spatial Preference Rewarding for MLLMs Spatial Understanding
Han Qiu, Peng Gao, Lewei Lu, Xiaoqin Zhang, Ling Shao, Shijian Lu Spatial-Temporal Aware Visuomotor Diffusion Policy Learning
Zhenyang Liu, Yikai Wang, Kuanning Wang, Longfei Liang, Xiangyang Xue, Yanwei Fu Spatial-Temporal Forgery Trace Based Forgery Image Identification
Yilin Wang, Zunlei Feng, Jiachi Wang, Hengrui Lou, Binjia Zhou, Jie Lei, Mingli Song, Yijun Bei Spatially-Varying Autofocus
Yingsi Qin, Aswin C. Sankaranarayanan, Matthew O'Toole SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images
Yu Sheng, Jiajun Deng, Xinran Zhang, Yu Zhang, Bei Hua, Yanyong Zhang, Jianmin Ji SpatialTrackerV2: Advancing 3D Point Tracking with Explicit Camera Motion
Yuxi Xiao, Jianyuan Wang, Nan Xue, Nikita Karaev, Yuri Makarov, Bingyi Kang, Xing Zhu, Hujun Bao, Yujun Shen, Xiaowei Zhou SPD: Shallow Backdoor Protecting Deep Backdoor Against Backdoor Detection
Shunjie Yuan, Xinghua Li, Xuelin Cao, Haiyan Zhang, Mengyao Zhu, Robert H. Deng Spectral Image Tokenizer
Carlos Esteves, Mohammed Suhail, Ameesh Makadia Spectral Sensitivity Estimation with an Uncalibrated Diffraction Grating
Lilika Makabe, Hiroaki Santo, Fumio Okura, Michael S. Brown, Yasuyuki Matsushita SpectralAR: Spectral Autoregressive Visual Generation
Yuanhui Huang, Weiliang Chen, Wenzhao Zheng, Yueqi Duan, Jie Zhou, Jiwen Lu SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition
Zeqi Zheng, Yanchen Huang, Yingchao Yu, Zizheng Zhu, Junfeng Tang, Zhaofei Yu, Yaochu Jin SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models
Stathis Galanakis, Alexandros Lattas, Stylianos Moschoglou, Bernhard Kainz, Stefanos Zafeiriou SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting
Shengjie Lin, Jiading Fang, Muhammad Zubair Irshad, Vitor Campagnolo Guizilini, Rares Andrei Ambrus, Greg Shakhnarovich, Matthew R. Walter Splat-Based 3D Scene Reconstruction with Extreme Motion-Blur
Hyeonjoong Jang, Dongyoung Choi, Donggun Kim, Woohyun Kang, Min H. Kim Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping
Emanuele Giacomini, Luca Di Giammarino, Lorenzo De Rebotti, Giorgio Grisetti, Martin R. Oswald SplatTalk: 3D VQA with Gaussian Splatting
Anh Thai, Songyou Peng, Kyle Genova, Leonidas Guibas, Thomas Funkhouser SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting
Shuaiting Li, Juncan Deng, Chengxuan Wang, Kedong Xu, Rongtao Deng, Hong Gu, Haibin Shen, Kejie Huang St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World
Haiwen Feng, Junyi Zhang, Qianqian Wang, Yufei Ye, Pengcheng Yu, Michael J. Black, Trevor Darrell, Angjoo Kanazawa Stable Diffusion Models Are Secretly Good at Visual In-Context Learning
Trevine Oorloff, Vishwanath Sindagi, Wele Gedara Chaminda Bandara, Ali Shafahi, Amin Ghiasi, Charan Prakash, Reza Ardekani Stable Score Distillation
Haiming Zhu, Yangyang Xu, Chenshu Xu, Tingrui Shen, Wenxi Liu, Yong Du, Jun Yu, Shengfeng He Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Jensen Zhou, Hang Gao, Vikram Voleti, Aaryaman Vasishta, Chun-Han Yao, Mark Boss, Philip Torr, Christian Rupprecht, Varun Jampani StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth
Zheng Zhang, Lihe Yang, Tianyu Yang, Chaohui Yu, Xiaoyang Guo, Yixing Lao, Hengshuang Zhao Staining and Locking Computer Vision Models Without Retraining
Oliver J. Sutton, Qinghua Zhou, George Leete, Alexander N. Gorban, Ivan Y. Tyukin STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
Rui Xie, Yinhong Liu, Penghao Zhou, Chen Zhao, Jun Zhou, Kai Zhang, Zhenyu Zhang, Jian Yang, Zhenheng Yang, Ying Tai Stealthy Backdoor Attack in Federated Learning via Adaptive Layer-Wise Gradient Alignment
Qingqian Yang, Peishen Yan, Xiaoyu Wu, Jiaru Zhang, Tao Song, Yang Hua, Hao Wang, Liangliang Wang, Haibing Guan SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering
Byeongjun Park, Hyojun Go, Hyelin Nam, Byung-Hoon Kim, Hyungjin Chung, Changick Kim Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
Yong Liu, Song-Li Wu, Sule Bai, Jiahao Wang, Yitong Wang, Yansong Tang Stereo Any Video: Temporally Consistent Stereo Matching
Junpeng Jing, Weixun Luo, Ye Mao, Krystian Mikolajczyk STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
Yun Li, Yiming Zhang, Tao Lin, Xiangrui Liu, Wenxiao Cai, Zheng Liu, Bo Zhao STIV: Scalable Text and Image Conditioned Video Generation
Zongyu Lin, Wei Liu, Chen Chen, Jiasen Lu, Wenze Hu, Tsu-Jui Fu, Jesse Allardice, Zhengfeng Lai, Liangchen Song, Bowen Zhang, Cha Chen, Yiran Fei, Lezhi Li, Yinfei Yang, Yizhou Sun, Kai-Wei Chang Stochastic Interpolants for Revealing Stylistic Flows Across the History of Art
Pingchuan Ma, Ming Gui, Johannes Schusterbauer, Xiaopei Yang, Olga Grebenkova, Vincent Tao Hu, Björn Ommer StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting
Shakiba Kheradmand, Delio Vicini, George Kopanas, Dmitry Lagun, Kwang Moo Yi, Mark Matthews, Andrea Tagliasacchi StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
Akio Kodaira, Chenfeng Xu, Toshiki Hazama, Takanori Yoshimoto, Kohei Ohno, Shogo Mitsuhori, Soichi Sugano, Hanying Cho, Zhijian Liu, Masayoshi Tomizuka, Kurt Keutzer Streaming VideoLLMs for Real-Time Procedural Video Understanding
Dibyadip Chatterjee, Edoardo Remelli, Yale Song, Bugra Tekin, Abhay Mittal, Bharat Bhatnagar, Necati Cihan Camgoz, Shreyas Hampali, Eric Sauser, Shugao Ma, Angela Yao, Fadime Sener Street Gaussians Without 3D Object Tracker
Ruida Zhang, Chengxi Li, Chenyangguang Zhang, Xingyu Liu, Haili Yuan, Yanyan Li, Xiangyang Ji, Gim Hee Lee Structure Matters: Revisiting Boundary Refinement in Video Object Segmentation
Guanyi Qin, Ziyue Wang, Daiyun Shen, Haofeng Liu, Hantao Zhou, Junde Wu, Runze Hu, Yueming Jin Structured Policy Optimization: Enhance Large Vision-Language Model via Self-Referenced Dialogue
Guohao Sun, Can Qin, Yihao Feng, Zeyuan Chen, Ran Xu, Sohail Dianat, Majid Rabbani, Raghuveer Rao, Zhiqiang Tao StyleMotif: Multi-Modal Motion Stylization Using Style-Content Cross Fusion
Ziyu Guo, Young Yoon Lee, Joseph Liu, Yizhak Ben-Shabat, Victor Zordan, Mubbasir Kapadia StyleSRN: Scene Text Image Super-Resolution with Text Style Embedding
Shengrong Yuan, Runmin Wang, Ke Hao, Xuqi Ma, Changxin Gao, Li Liu, Nong Sang Stylized-Face: A Million-Level Stylized Face Dataset for Face Recognition
Zhengyuan Peng, Jianqing Xu, Yuge Huang, Jinkun Hao, Shouhong Ding, Zhizhong Zhang, Xin Tan, Lizhuang Ma SummDiff: Generative Modeling of Video Summarization with Diffusion
Kwanseok Kim, Jaehoon Hahm, Sumin Kim, Jinhwan Sul, Byunghak Kim, Joonseok Lee Super Resolved Imaging with Adaptive Optics
Robin Swanson, Esther Y. H. Lin, Masen Lamb, Suresh Sivanandam, Kiriakos N. Kutulakos Supercharged One-Step Text-to-Image Diffusion Models with Negative Prompts
Viet Nguyen, Anh Nguyen, Trung Dao, Khoi Nguyen, Cuong Pham, Toan Tran, Anh Tran SuperDec: 3D Scene Decomposition with Superquadrics Primitives
Elisabetta Fedele, Boyang Sun, Leonidas Guibas, Marc Pollefeys, Francis Engelmann Superpowering Open-Vocabulary Object Detectors for X-Ray Vision
Pablo Garcia-Fernandez, Lorenzo Vaquero, Mingxuan Liu, Feng Xue, Daniel Cores, Nicu Sebe, Manuel Mucientes, Elisa Ricci Supervised Exploratory Learning for Long-Tailed Visual Recognition
Zhongquan Jian, Yanhao Chen, Yancheng Wang, Junfeng Yao, Meihong Wang, Qingqiang Wu SViM3D: Stable Video Material Diffusion for Single Image 3D Generation
Andreas Engelhardt, Mark Boss, Vikram Voleti, Chun-Han Yao, Hendrik P. A. Lensch, Varun Jampani SweetTok: Semantic-Aware Spatial-Temporal Tokenizer for Compact Video Discretization
Zhentao Tan, Ben Xue, Jian Jia, Junhao Wang, Wencai Ye, Shaoyun Shi, Mingjie Sun, Wenjin Wu, Quan Chen, Peng Jiang Synchronization of Multiple Videos
Avihai Naaman, Ron Shapira Weber, Oren Freifeld SynCity: Training-Free Generation of 3D Worlds
Paul Engstler, Aleksandar Shtedritski, Iro Laina, Christian Rupprecht, Andrea Vedaldi Synergistic Prompting for Robust Visual Recognition with Missing Modalities
Zhihui Zhang, Luanyuan Dai, Qika Lin, Yunfeng Diao, Guangyin Jin, Yufei Guo, Jing Zhang, Xiaoshuai Hao SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data
Xilin He, Cheng Luo, Xiaole Xian, Bing Li, Muhammad Haris Khan, Zongyuan Ge, Weicheng Xie, Siyang Song, Linlin Shen, Bernard Ghanem, Xiangyu Yue Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection
Jinglun Li, Kaixun Jiang, Zhaoyu Chen, Bo Lin, Yao Tang, Weifeng Ge, Wenqiang Zhang Synthetic Video Enhances Physical Fidelity in Video Synthesis
Qi Zhao, Xingyu Ni, Ziyu Wang, Feng Cheng, Ziyan Yang, Lu Jiang, Bohan Wang T2Bs: Text-to-Character Blendshapes via Video Generation
Jiahao Luo, Chaoyang Wang, Michael Vasilkovsky, Vladislav Shakhrai, Di Liu, Peiye Zhuang, Sergey Tulyakov, Peter Wonka, Hsin-Ying Lee, James Davis, Jian Wang TACO: Taming Diffusion for In-the-Wild Video Amodal Completion
Ruijie Lu, Yixin Chen, Yu Liu, Jiaxiang Tang, Junfeng Ni, Diwen Wan, Gang Zeng, Siyuan Huang TAD-E2E: A Large-Scale End-to-End Autonomous Driving Dataset
Chang Liu, Mingxu Zhu, Zheyuan Zhang, Linna Song, Xiao Zhao, Qingliang Luo, Qi Wang, Chufan Guo, Kuifeng Su Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
Luca Barsellotti, Lorenzo Bianchi, Nicola Messina, Fabio Carrara, Marcella Cornia, Lorenzo Baraldi, Fabrizio Falchi, Rita Cucchiara TAPNext: Tracking Any Point (TAP) as Next Token Prediction
Artem Zholus, Carl Doersch, Yi Yang, Skanda Koppula, Viorica Patraucean, Xu Owen He, Ignacio Rocco, Mehdi S. M. Sajjadi, Sarath Chandar, Ross Goroshin TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang, Yutong Liu, Yangguang Li, Renrui Zhang, Yufei Liu, Kai Wang, Wanli Ouyang, Zhiwei Xiong, Peng Gao, Qibin Hou, Ming-Ming Cheng TARS: Traffic-Aware Radar Scene Flow Estimation
Jialong Wu, Marco Braun, Dominic Spata, Matthias Rottmann Task Vector Quantization for Memory-Efficient Model Merging
Youngeun Kim, Seunghwan Lee, Aecheon Jung, Bogon Ryu, Sungeun Hong TAViS: Text-Bridged Audio-Visual Segmentation with Foundation Models
Ziyang Luo, Nian Liu, Xuguang Yang, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Junwei Han TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation
Amin Karimi Monsefi, Mridul Khurana, Rajiv Ramnath, Anuj Karpatne, Wei-Lun Chao, Cheng Zhang Teaching VLMs to Localize Specific Objects from In-Context Examples
Sivan Doveh, Nimrod Shabtay, Eli Schwartz, Hilde Kuehne, Raja Giryes, Rogerio Feris, Leonid Karlinsky, James Glass, Assaf Arbelle, Shimon Ullman, M. Jehanzeb Mirza TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance
Minghao Fu, Guo-Hua Wang, Xiaohao Chen, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang Tensor-Aggregated LoRA in Federated Fine-Tuning
Zhixuan Li, Binqian Xu, Xiangbo Shu, Jiachao Zhang, Yazhou Yao, Guo-Sen Xie, Jinhui Tang TeRA: Rethinking Text-Guided Realistic 3D Avatar Generation
Yanwen Wang, Yiyu Zhuang, Jiawei Zhang, Li Wang, Yifei Zeng, Xun Cao, Xinxin Zuo, Hao Zhu TerraMind: Large-Scale Generative Multimodality for Earth Observation
Johannes Jakubik, Felix Yang, Benedikt Blumenstiel, Erik Scheurer, Rocco Sedona, Stefano Maurogiovanni, Jente Bosmans, Nikolaos Dionelis, Valerio Marsocci, Niklas Kopp, Rahul Ramachandran, Paolo Fraccaro, Thomas Brunschwiler, Gabriele Cavallaro, Juan Bernabe-Moreno, Nicolas Longépé Test-Time Adaptation for Foundation Medical Segmentation Model Without Parametric Updates
Kecheng Chen, Xinyu Luo, Tiexin Qin, Jie Liu, Hui Liu, Victor Ho Fun Lee, Hong Yan, Haoliang Li Test-Time Prompt Tuning for Zero-Shot Depth Completion
Chanhwi Jeong, Inhwan Bae, Jin-Hwi Park, Hae-Gon Jeon Test-Time Retrieval-Augmented Adaptation for Vision-Language Models
Xinqi Fan, Xueli Chen, Luoxiao Yang, Chuin Hong Yap, Rizwan Qureshi, Qi Dou, Moi Hoon Yap, Mubarak Shah Text-Guided Visual Prompt DINO for Generic Segmentation
Yuchen Guan, Chong Sun, Canmiao Fu, Zhipeng Huang, Chun Yuan, Chen Li Text-to-Any-Skeleton Motion Generation Without Retargeting
Qingyuan Liu, Ke Lv, Kun Dong, Jian Xue, Zehai Niu, Jinbao Wang Text2Outfit: Controllable Outfit Generation with Multimodal Language Models
Yuanhao Zhai, Yen-Liang Lin, Minxu Peng, Larry S. Davis, Ashwin Chandramouli, Junsong Yuan, David Doermann The Source Image Is the Best Attention for Infrared and Visible Image Fusion
Song Wang, Xie Han, Liqun Kuang, Boying Wang, Zhongyu Chen, Zherui Qiao, Fan Yang, Xiaoxia Liu, Bingyu Zhang, Zhixun Wang TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi, Eddy Ilg, Margret Keuper, Hideki Tanaka, Masao Utiyama, Raj Dabre, Steffen Eger, Simone Ponzetto Tile-Wise vs. Image-Wise: Random-Tile Loss and Training Paradigm for Gaussian Splatting
Xiaoyu Zhang, Weihong Pan, Xiaojun Xiang, Hongjia Zhai, Liyang Zhou, Hanqing Jiang, Guofeng Zhang Time-Aware Auto White Balance in Mobile Photography
Mahmoud Afifi, Luxi Zhao, Abhijith Punnappurath, Mohamed A. Abdelsalam, Ran Zhang, Michael S. Brown TITAN-Guide: Taming Inference-Time Alignment for Guided Text-to-Video Diffusion Models
Christian Simon, Masato Ishii, Akio Hayakawa, Zhi Zhong, Shusuke Takahashi, Takashi Shibuya, Yuki Mitsufuji TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
Ayush Gupta, Anirban Roy, Rama Chellappa, Nathaniel D. Bastian, Alvaro Velasquez, Susmit Jha Token Activation mAP to Visually Explain Multimodal LLMs
Yi Li, Hualiang Wang, Xinpeng Ding, Haonan Wang, Xiaomeng Li Token-Efficient VLM: High-Resolution Image Understanding via Dynamic Region Proposal
Yitong Jiang, Jinwei Gu, Tianfan Xue, Ka Chun Cheung, Pavlo Molchanov, Hongxu Yin, Sifei Liu TokensGen: Harnessing Condensed Tokens for Long Video Generation
Wenqi Ouyang, Zeqi Xiao, Danni Yang, Yifan Zhou, Shuai Yang, Lei Yang, Jianlou Si, Xingang Pan TokenUnify: Scaling up Autoregressive Pretraining for Neuron Segmentation
Yinda Chen, Haoyuan Shi, Xiaoyu Liu, Te Shi, Ruobing Zhang, Dong Liu, Zhiwei Xiong, Feng Wu TopoTTA: Topology-Enhanced Test-Time Adaptation for Tubular Structure Segmentation
Jiale Zhou, Wenhan Wang, Shikun Li, Xiaolei Qu, Xin Guo, Yizhong Liu, Wenzhong Tang, Xun Lin, Yefeng Zheng TorchAdapt: Towards Light-Agnostic Real-Time Visual Perception
Khurram Azeem Hashmi, Karthik Palyakere Suresh, Didier Stricker, Muhammad Zeshan Afzal Toward Material-Agnostic System Identification from Videos
Yizhou Zhao, Haoyu Chen, Chunjiang Liu, Zhenyang Li, Charles Herrmann, Junhwa Hur, Yinxiao Li, Ming-Hsuan Yang, Bhiksha Raj, Min Xu Towards a 3D Transfer-Based Black-Box Attack via Critical Feature Guidance
Shuchao Pang, Zhenghan Chen, Shen Zhang, Liming Lu, Siyuan Liang, Anan Du, Yongbin Zhou Towards a Unified Copernicus Foundation Model for Earth Vision
Yi Wang, Zhitong Xiong, Chenying Liu, Adam J. Stewart, Thomas Dujardin, Nikolaos Ioannis Bountos, Angelos Zavras, Franziska Gerken, Ioannis Papoutsis, Laura Leal-Taixé, Xiao Xiang Zhu Towards a Universal 3D Medical Multi-Modality Generalization via Learning Personalized Invariant Representation
Zhaorui Tan, Xi Yang, Tan Pan, Tianyi Liu, Chen Jiang, Xin Guo, Qiufeng Wang, Anh Nguyen, Yuan Qi, Kaizhu Huang, Yuan Cheng Towards Annotation-Free Evaluation: KPAScore for Human Keypoint Detection
Xiaoxiao Wang, Chunxiao Li, Peng Sun, Boming Miao, Yunjian Zhang, Yao Zhu Towards Efficient General Feature Prediction in Masked Skeleton Modeling
Shengkai Sun, Zefan Zhang, Jianfeng Dong, Zhiyong Cheng, Xiaojun Chang, Meng Wang Towards Explicit Exoskeleton for the Reconstruction of Complicated 3D Human Avatars
Yifan Zhan, Qingtian Zhu, Muyao Niu, Mingze Ma, Jiancheng Zhao, Zhihang Zhong, Xiao Sun, Yu Qiao, Yinqiang Zheng Towards Foundational Models for Single-Chip Radar
Tianshu Huang, Akarsh Prabhakara, Chuhan Chen, Jay Karhade, Deva Ramanan, Matthew O'toole, Anthony Rowe Towards Higher Effective Rank in Parameter-Efficient Fine-Tuning Using Khatri-Rao Product
Paul Albert, Frederic Z. Zhang, Hemanth Saratchandran, Anton van den Hengel, Ehsan Abbasnejad Towards Long-Horizon Vision-Language-Action System: Reasoning, Acting and Memory
Daixun Li, Yusi Zhang, Mingxiang Cao, Donglai Liu, Weiying Xie, Tianlin Hui, Lunkai Lin, Zhiqiang Xie, Yunsong Li Towards Robustness of Person Search Against Corruptions
Woojung Son, Yoonki Cho, Guoyuan An, Chanmi Lee, Sung-Eui Yoon Towards Safer and Understandable Driver Intention Prediction
Mukilan Karuppasamy, Shankar Gangisetty, Shyam Nandan Rai, Carlo Masone, C V Jawahar Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting
Xingyu Miao, Haoran Duan, Quanhao Qian, Jiuniu Wang, Yang Long, Ling Shao, Deli Zhao, Ran Xu, Gongjie Zhang TR-PTS: Task-Relevant Parameter and Token Selection for Efficient Tuning
Siqi Luo, Haoran Yang, Yi Xin, Mingyang Yi, Guangyang Wu, Guangtao Zhai, Xiaohong Liu Trace3D: Consistent Segmentation Lifting via Gaussian Instance Tracing
Hongyu Shen, Junfeng Ni, Yixin Chen, Weishuo Li, Mingtao Pei, Siyuan Huang Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection
Yichen Lu, Siwei Nie, Minlong Lu, Xudong Yang, Xiaobo Zhang, Peng Zhang Trade-Offs in Image Generation: How Do Different Dimensions Interact?
Sicheng Zhang, Binzhu Xie, Zhonghao Yan, Yuli Zhang, Donghao Zhou, Xiaofei Chen, Shi Qiu, Jiaqi Liu, Guoyang Xie, Zhichao Lu TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes
Yan Xia, Yunxiang Lu, Rui Song, Oussema Dhaouadi, João F. Henriques, Daniel Cremers Training-Free and Adaptive Sparse Attention for Efficient Long Video Generation
Yifei Xia, Suhan Ling, Fangcheng Fu, Yujie Wang, Huixia Li, Xuefeng Xiao, Bin Cui Training-Free Class Purification for Open-Vocabulary Semantic Segmentation
Qi Chen, Lingxiao Yang, Yun Chen, Nailong Zhao, Jianhuang Lai, Jie Shao, Xiaohua Xie Training-Free Generation of Temporally Consistent Rewards from VLMs
Yinuo Zhao, Jiale Yuan, Zhiyuan Xu, Xiaoshuai Hao, Xinyi Zhang, Kun Wu, Zhengping Che, Chi Harold Liu, Jian Tang Training-Free Geometric Image Editing on Diffusion Models
Hanshen Zhu, Zhen Zhu, Kaile Zhang, Yiming Gong, Yuliang Liu, Xiang Bai Training-Free Industrial Defect Generation with Diffusion Models
Ruyi Xu, Yen-Tzu Chiu, Tai-I Chen, Oscar Chew, Yung-Yu Chuang, Wen-Huang Cheng Training-Free Personalization via Retrieval and Reasoning on Fingerprints
Deepayan Das, Davide Talon, Yiming Wang, Massimiliano Mancini, Elisa Ricci Training-Free Text-Guided Image Editing with Visual Autoregressive Model
Yufei Wang, Lanqing Guo, Zhihao Li, Jiaxing Huang, Pichao Wang, Bihan Wen, Jian Wang TransiT: Transient Transformer for Non-Line-of-Sight Videography
Ruiqian Li, Siyuan Shen, Suan Xia, Ziheng Wang, Xingyue Peng, Chengxuan Song, Yingsheng Zhu, Tao Wu, Shiying Li, Jingyi Yu Transparent Vision: A Theory of Hierarchical Invariant Representations
Shuren Qi, Yushu Zhang, Chao Wang, Zhihua Xia, Xiaochun Cao, Fenglei Fan TREAD: Token Routing for Efficient Architecture-Agnostic Diffusion Training
Felix Krause, Timy Phan, Ming Gui, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer Tree Skeletonization from 3D Point Clouds by Denoising Diffusion
Elias Ariel Marks, Lucas Nunes, Federico Magistri, Matteo Sodano, Rodrigo Marcuzzi, Lars Zimmermann, Jens Behley, Cyrill Stachniss Trial-Oriented Visual Rearrangement
Yuyi Liu, Xinhang Song, Tianliang Qi, Shuqiang Jiang TRNAS: A Training-Free Robust Neural Architecture Search
Yeming Yang, Qingling Zhu, Jianping Luo, Ka-Chun Wong, Qiuzhen Lin, Jianqiang Li Trokens: Semantic-Aware Relational Trajectory Tokens for Few-Shot Action Recognition
Pulkit Kumar, Shuaiyi Huang, Matthew Walmer, Sai Saketh Rambhatla, Abhinav Shrivastava Trust but Verify: Programmatic VLM Evaluation in the Wild
Viraj Prabhu, Senthil Purushwalkam, An Yan, Caiming Xiong, Ran Xu TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination via Latent Truthful-Guided Pre-Intervention
Jinhao Duan, Fei Kong, Hao Cheng, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu Tune-Your-Style: Intensity-Tunable 3D Style Transfer with Gaussian Splatting
Yian Zhao, Rushi Ye, Ruochong Zheng, Zesen Cheng, Chaoran Feng, Jiashu Yang, Pengchong Qiao, Chang Liu, Jie Chen Turbo2K: Towards Ultra-Efficient and High-Quality 2k Video Synthesis
Jingjing Ren, Wenbo Li, Zhongdao Wang, Haoze Sun, Bangzhen Liu, Haoyu Chen, Jiaqi Xu, Aoxue Li, Shifeng Zhang, Bin Shao, Yong Guo, Lei Zhu TurboReg: TurboClique for Robust and Efficient Point Cloud Registration
Shaocheng Yan, Pengcheng Shi, Zhenjun Zhao, Kaixin Wang, Kuang Cao, Ji Wu, Jiayuan Li TurboVSR: Fantastic Video Upscalers and Where to Find Them
Zhongdao Wang, Guodongfang Zhao, Jingjing Ren, Bailan Feng, Shifeng Zhang, Wenbo Li TWIST & SCOUT: Grounding Multimodal LLM-Experts by Forget-Free Tuning
Aritra Bhowmik, Mohammad Mahdi Derakhshani, Dennis Koelma, Yuki M. Asano, Martin R. Oswald, Cees G. M. Snoek U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration
Xiaofan Li, Zhihao Xu, Chenming Wu, Zhao Yang, Yumeng Zhang, Jiang-Jiang Liu, Haibao Yu, Xiaoqing Ye, Yuan Wang, Shirui Li, Xun Sun, Ji Wan, Jun Wang UAVScenes: A Multi-Modal Dataset for UAVs
Sijie Wang, Siqi Li, Yawei Zhang, Shangshu Yu, Shenghai Yuan, Rui She, Quanjiang Guo, JinXuan Zheng, Ong Kang Howe, Leonrich Chandra, Shrivarshann Srijeyan, Aditya Sivadas, Toshan Aggarwal, Heyuan Liu, Hongming Zhang, Chujie Chen, Junyu Jiang, Lihua Xie, Wee Peng Tay UDC-VIT: A Real-World Video Dataset for Under-Display Cameras
Kyusu Ahn, JiSoo Kim, Sangik Lee, HyunGyu Lee, Byeonghyun Ko, Chanwoo Park, Jaejin Lee UINavBench: A Framework for Comprehensive Evaluation of Interactive Digital Agents
Harsh Agrawal, Eldon Schoop, Xinlei Pan, Anuj Mahajan, Ari Seff, Di Feng, Ruijia Cheng, Andres Romero Mier Y Teran, Esteban Gomez, Abhishek Sundararajan, Forrest Huang, Amanda Swearngin, Mohana Prasad Sathya Moorthy, Jeff Nichols, Alexander Toshev UIPro: Unleashing Superior Interaction Capability for GUI Agents
Hongxin Li, Jingran Su, Jingfan Chen, Zheng Ju, Yuntao Chen, Qing Li, Zhaoxiang Zhang Ultra High-Resolution Image Inpainting with Patch-Based Content Consistency Adapter
Jianhui Zhang, Shen Cheng, Qirui Sun, Jia Liu, Wang Luyang, Chaoyu Feng, Chen Fang, Lei Lei, Jue Wang, Shuaicheng Liu Unbiased Missing-Modality Multimodal Learning
Ruiting Dai, Chenxi Li, Yandong Yan, Lisi Mo, Ke Qin, Tao He Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction
Yunheng Li, Yuxuan Li, Quan-Sheng Zeng, Wenhai Wang, Qibin Hou, Ming-Ming Cheng Uncalibrated Structure from Motion on a Sphere
Jonathan Ventura, Viktor Larsson, Fredrik Kahl Uncertainty-Aware Diffusion-Guided Refinement of 3D Scenes
Sarosij Bose, Arindam Dutta, Sayak Nag, Junge Zhang, Jiachen Li, Konstantinos Karydis, Amit K. Roy-Chowdhury Understanding Co-Speech Gestures In-the-Wild
Sindhu B Hegde, K R Prajwal, Taein Kwon, Andrew Zisserman Understanding Museum Exhibits Using Vision-Language Reasoning
Ada-Astrid Balauca, Sanjana Garai, Stefan Balauca, Rasesh Udayakumar Shetty, Naitik Agrawal, Dhwanil Subhashbhai Shah, Yuqian Fu, Xi Wang, Kristina Toutanova, Danda Pani Paudel, Luc Van Gool Understanding Personal Concept in Open-Vocabulary Semantic Segmentation
Sunghyun Park, Jungsoo Lee, Shubhankar Borse, Munawar Hayat, Sungha Choi, Kyuwoong Hwang, Fatih Porikli UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer
Haoxuan Wang, Jinlong Peng, Qingdong He, Hao Yang, Ying Jin, Jiafu Wu, Xiaobin Hu, Yanjie Pan, Zhenye Gan, Mingmin Chi, Bo Peng, Yabiao Wang UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation
Chaitanya Patel, Hiroki Nakamura, Yuta Kyuragi, Kazuki Kozuka, Juan Carlos Niebles, Ehsan Adeli Unified Adversarial Augmentation for Improving Palmprint Recognition
Jianlong Jin, Chenglong Zhao, Ruixin Zhang, Sheng Shang, Yang Zhao, Jun Wang, Jingyun Zhang, Shouhong Ding, Wei Jia, Yunsheng Wu Unified Multimodal Understanding via Byte-Pair Visual Encoding
Wanpeng Zhang, Yicheng Feng, Hao Luo, Yijiang Li, Zihao Yue, Sipeng Zheng, Zongqing Lu Unified Open-World Segmentation with Multi-Modal Prompts
Yang Liu, Yufei Yin, Chenchen Jing, Muzhi Zhu, Hao Chen, Yuling Xi, Bo Feng, Hao Wang, Shiyu Li, Chunhua Shen Unified Video Generation via Next-Set Prediction in Continuous Domain
Zhanzhou Feng, Qingpei Guo, Xinyu Xiao, Ruihan Xu, Ming Yang, Shiliang Zhang UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis
Yuanrui Wang, Cong Han, Yafei Li, Zhipeng Jin, Xiawei Li, SiNan Du, Wen Tao, Shuanglong Li, Yi Yang, Chun Yuan, Liu Lin UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
Yuping Wang, Xiangyu Huang, Xiaokang Sun, Mingxuan Yan, Shuo Xing, Zhengzhong Tu, Jiachen Li UniRes: Universal Image Restoration for Complex Degradations
Mo Zhou, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Vishal M. Patel, Hossein Talebi Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin
Fangyikang Wang, Hubery Yin, Lei Qian, Yinan Li, Shaobin Zhuang, Huminhao Zhu, Yilin Zhang, Yanlong Tang, Chao Zhang, Hanbin Zhao, Hui Qian, Chen Li Unleashing Vecset Diffusion Model for Fast Shape Generation
Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Fuyun Wang, Huiwen Shi, Xianghui Yang, Qingxiang Lin, Jingwei Huang, Yuhong Liu, Jie Jiang, Chunchao Guo, Xiangyu Yue Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation
Yihong Cao, Jiaming Zhang, Xu Zheng, Hao Shi, Kunyu Peng, Hang Liu, Kailun Yang, Hui Zhang Unlocking the Potential of Diffusion Priors in Blind Face Restoration
Yunqi Miao, Zhiyu Qu, Mingqi Gao, Changrui Chen, Jifei Song, Jungong Han, Jiankang Deng UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields
Fabian Perez, Sara Rojas, Carlos Hinojosa, Hoover Rueda-Chacón, Bernard Ghanem Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving
Junhao Ge, Zuhong Liu, Longteng Fan, Yifan Jiang, Jiaqi Su, Yiming Li, Zhejun Zhang, Siheng Chen UnrealZoo: Enriching Photo-Realistic Virtual Worlds for Embodied AI
Fangwei Zhong, Kui Wu, Churan Wang, Hao Chen, Hai Ci, Zhoujun Li, Yizhou Wang Unsupervised Histopathological Image Semantic Segmentation with Overlapping Patches Consistency Constraint
Wentian Cai, Weizhao Weng, Zihao Huang, Yandan Chen, Siquan Huang, Ping Gao, Victor C. M. Leung, Ying Gao UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling
Peiming Li, Ziyi Wang, Yulin Yuan, Hong Liu, Xiangming Meng, Junsong Yuan, Mengyuan Liu V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction
Zewei Zhou, Hao Xiang, Zhaoliang Zheng, Seth Z. Zhao, Mingyue Lei, Yun Zhang, Tianhui Cai, Xinyi Liu, Johnson Liu, Maheswari Bajji, Xin Xia, Zhiyu Huang, Bolei Zhou, Jiaqi Ma VACE: All-in-One Video Creation and Editing
Zeyinzi Jiang, Zhen Han, Chaojie Mao, Jingfeng Zhang, Yulin Pan, Yu Liu VAGUE: Visual Contexts Clarify Ambiguous Expressions
Heejeong Nam, Jinwoo Ahn, Keummin Ka, Jiwan Chung, Youngjae Yu VCA: Video Curious Agent for Long Video Understanding
Zeyuan Yang, Delin Chen, Xueyang Yu, Maohao Shen, Chuang Gan VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation
Shoubin Yu, Difan Liu, Ziqiao Ma, Yicong Hong, Yang Zhou, Hao Tan, Joyce Chai, Mohit Bansal Verbalized Representation Learning for Interpretable Few-Shot Generalization
Cheng-Fu Yang, Da Yin, Wenbo Hu, Heng Ji, Nanyun Peng, Bolei Zhou, Kai-Wei Chang VertexRegen: Mesh Generation with Continuous Level of Detail
Xiang Zhang, Yawar Siddiqui, Armen Avetisyan, Chris Xie, Jakob Engel, Henry Howard-Jenkins VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu, Matthias Bethge, Wieland Brendel, A. Sophia Koepke ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis
Onkar Susladkar, Gayatri Deshmukh, Yalcin Tur, Gorkem Durak, Ulas Bagci Vid-Group: Temporal Video Grounding Pretraining from Unlabeled Videos in the Wild
Peijun Bao, Chenqi Kong, Siyuan Yang, Zihao Shao, Xinghao Jiang, Boon Poh Ng, Meng Hwa Er, Alex Kot Video Color Grading via Look-up Table Generation
Seunghyun Shin, Dongmin Shin, Jisu Shin, Hae-Gon Jeon, Joon-Young Lee Video Individual Counting for Moving Drones
Yaowu Fan, Jia Wan, Tao Han, Antoni B. Chan, Andy J. Ma Video Motion Graphs
Haiyang Liu, Zhan Xu, Fa-Ting Hong, Hsin-Ping Huang, Yi Zhou, Yang Zhou Video-T1: Test-Time Scaling for Video Generation
Fangfu Liu, Hanyang Wang, Yimo Cai, Kaiyan Zhang, Xiaohang Zhan, Yueqi Duan VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang, Wanying Dou, Linkai Peng, Hongyi Pan, Ulas Bagci, Boqing Gong VideoAuteur: Towards Long Narrative Video Generation
Junfei Xiao, Feng Cheng, Lu Qi, Liangke Gui, Yang Zhao, Shanchuan Lin, Jiepeng Cen, Zhibei Ma, Alan Yuille, Lu Jiang VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng, Yijiang Li, Wanpeng Zhang, Sipeng Zheng, Hao Luo, Zihao Yue, Zongqing Lu VideoSetDiff: Identifying and Reasoning Similarities and Differences in Similar Videos
Yue Qiu, Yanjun Sun, Takuma Yagi, Shusaku Egami, Natsuki Miyata, Ken Fukuda, Kensho Hara, Ryusuke Sagawa VideoVAE+: Large Motion Video Autoencoding with Cross-Modal Video VAE
Yazhou Xing, Yang Fei, Yingqing He, Jingye Chen, Jiaxin Xie, Xiaowei Chi, Qifeng Chen ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition
Ronggang Huang, Haoxin Yang, Yan Cai, Xuemiao Xu, Huaidong Zhang, Shengfeng He ViLLa: Video Reasoning Segmentation with Large Language Model
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Hengshuang Zhao ViLU: Learning Vision-Language Uncertainties for Failure Prediction
Marc Lafon, Yannis Karmim, Julio Silva-Rodríguez, Paul Couairon, Clément Rambour, Raphael Fournier-Sniehotta, Ismail Ben Ayed, Jose Dolz, Nicolas Thome ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng, Shuaiting Li, Zeyu Wang, Kedong Xu, Hong Gu, Kejie Huang VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification
Xiao-Wen Zhang, Delong Zhang, Yi-Xing Peng, Zhi Ouyang, Jingke Meng, Wei-Shi Zheng Vision-Language Models Can't See the Obvious
Ngoc Dung Huynh, Phuc H Le-Khac, Wamiq Reyaz Para, Ankit Singh, Sanath Narayan VisionMath: Vision-Form Mathematical Problem-Solving
Zongyang Ma, Yuxin Chen, Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Shaojie Zhu, Chengxiang Zhuo, Bing Li, Ye Liu, Zang Li, Ying Shan, Weiming Hu ViSpeak: Visual Instruction Feedback in Streaming Videos
Shenghao Fu, Qize Yang, Yuan-Ming Li, Yi-Xing Peng, Kun-Yu Lin, Xihan Wei, Jian-Fang Hu, Xiaohua Xie, Wei-Shi Zheng Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images
Boyang Deng, Songyou Peng, Kyle Genova, Gordon Wetzstein, Noah Snavely, Leonidas Guibas, Thomas Funkhouser Visual Intention Grounding for Egocentric Assistants
Pengzhan Sun, Junbin Xiao, Tze Ho Elden Tse, Yicong Li, Arjun Akula, Angela Yao Visual Modality Prompt for Adapting Vision-Language Object Detectors
Heitor R. Medeiros, Atif Belal, Srikanth Muralidharan, Eric Granger, Marco Pedersoli Visual Relation Diffusion for Human-Object Interaction Detection
Ping Cao, Yepeng Tang, Chunjie Zhang, Xiaolong Zheng, Chao Liang, Yunchao Wei, Yao Zhao Visual Test-Time Scaling for GUI Agent Grounding
Tiange Luo, Lajanugen Logeswaran, Justin Johnson, Honglak Lee Visual Textualization for Image Prompted Object Detection
Yongjian Wu, Yang Zhou, Jiya Saiyin, Bingzheng Wei, Yan Xu Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu, Zeyi Sun, Yuhang Zang, Xiaoyi Dong, Yuhang Cao, Haodong Duan, Dahua Lin, Jiaqi Wang VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li, Ruoyi Du, Juncheng Yan, Le Zhuo, Zhen Li, Peng Gao, Zhanyu Ma, Ming-Ming Cheng VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
Shiduo Zhang, Zhe Xu, Peiju Liu, Xiaopeng Yu, Yuan Li, Qinghui Gao, Zhaoye Fei, Zhangyue Yin, Zuxuan Wu, Yu-Gang Jiang, Xipeng Qiu VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior
Xindi Yang, Baolu Li, Yiming Zhang, Zhenfei Yin, Lei Bai, Liqian Ma, Zhiyong Wang, Jianfei Cai, Tien-Tsin Wong, Huchuan Lu, Xu Jia VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
Shijie Zhou, Alexander Vilesov, Xuehai He, Ziyu Wan, Shuwang Zhang, Aditya Nagachandra, Di Chang, Dongdong Chen, Xin Eric Wang, Achuta Kadambi VLR-Driver: Large Vision-Language-Reasoning Models for Embodied Autonomous Driving
Fanjie Kong, Yitong Li, Weihuang Chen, Chen Min, Yizhe Li, Zhiqiang Gao, Haoyang Li, Zhongyu Guo, Hongbin Sun VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
Jiacheng Ruan, Wenzhen Yuan, Xian Gao, Ye Guo, Daoxin Zhang, Zhe Xu, Yao Hu, Ting Liu, Yuzhuo Fu VMBench: A Benchmark for Perception-Aligned Video Motion Generation
Xinran Ling, Chen Zhu, Meiqi Wu, Hangyu Li, Xiaokun Feng, Cundian Yang, Aiming Hao, Jiashu Zhu, Jiahong Wu, Xiangxiang Chu VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation Under Real Occlusions
Yash Garg, Saketh Bachu, Arindam Dutta, Rohit Lal, Sarosij Bose, Calvin-Khang Ta, M. Salman Asif, Amit Roy-Chowdhury VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
Kim Sung-Bin, Jeongsoo Choi, Puyuan Peng, Joon Son Chung, Tae-Hyun Oh, David Harwath VoluMe - Authentic 3D Video Calls from Live Gaussian Splat Prediction
Martin de La Gorce, Charlie Hewitt, Tibor Takács, Robert Gerdisch, Zafiirah Hosenie, Givi Meishvili, Marek Kowalski, Thomas J. Cashman, Antonio Criminisi VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding
Minchao Jiang, Shunyu Jia, Jiaming Gu, Xiaoyuan Lu, Guangming Zhu, Anqi Dong, Liang Zhang Voyaging into Perpetual Dynamic Scenes from a Single View
Fengrui Tian, Tianjiao Ding, Jinqi Luo, Hancheng Min, Rene Vidal VPO: Aligning Text-to-Video Generation Models with Prompt Optimization
Jiale Cheng, Ruiliang Lyu, Xiaotao Gu, Xiao Liu, Jiazheng Xu, Yida Lu, Jiayan Teng, Zhuoyi Yang, Yuxiao Dong, Jie Tang, Hongning Wang, Minlie Huang VPR-Cloak: A First Look at Privacy Cloak Against Visual Place Recognition
Shuting Dong, Mingzhi Chen, Feng Lu, Hao Yu, Guanghao Li, Zhe Wu, Ming Tang, Chun Yuan VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
Jiashuo Yu, Yue Wu, Meng Chu, Zhifei Ren, Zizheng Huang, Pei Chu, Ruijie Zhang, Yinan He, Qirui Li, Songze Li, Zhenxiang Li, Zhongying Tu, Conghui He, Yu Qiao, Yali Wang, Yi Wang, Limin Wang VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning
Jinglei Zhang, Yuanfan Guo, Rolandos Alexandros Potamias, Jiankang Deng, Hang Xu, Chao Ma WalkVLM: Aid Visually Impaired People Walking by Vision Language Model
Zhiqiang Yuan, Ting Zhang, Yeshuang Zhu, Jiapei Zhang, Ying Deng, Zexi Jia, Peixiang Luo, Xiaoyue Duan, Jie Zhou, Jinchao Zhang WarpHE4D: Dense 4D Head mAP Toward Full Head Reconstruction
Jongseob Yun, Yong-Hoon Kwon, Min-Gyu Park, Ju-Mi Kang, Min-Ho Lee, Inho Chang, Ju Hong Yoon, Kuk-Jin Yoon Wavelet Policy: Lifting Scheme for Policy Learning in Long-Horizon Tasks
Hao Huang, Shuaihang Yuan, Geeta Chandra Raju Bethala, Congcong Wen, Anthony Tzes, Yi Fang WaveMamba: Wavelet-Driven Mamba Fusion for RGB-Infrared Object Detection
Haodong Zhu, Wenhao Dong, Linlin Yang, Hong Li, Yuguang Yang, Yangyang Ren, Qingcheng Zhu, Zichao Feng, Changbai Li, Shaohui Lin, Runqi Wang, Xiaoyan Luo, Baochang Zhang Weakly-Supervised Learning of Dense Functional Correspondences
Stefan Stojanov, Linan Zhao, Yunzhi Zhang, Daniel L. K. Yamins, Jiajun Wu Web Artifact Attacks Disrupt Vision Language Models
Maan Qraitem, Piotr Teterwak, Kate Saenko, Bryan A. Plummer What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
Lorenzo Baraldi, Davide Bucciarelli, Federico Betti, Marcella Cornia, Lorenzo Baraldi, Nicu Sebe, Rita Cucchiara What if: Understanding Motion Through Sparse Interactions
Stefan Andreas Baumann, Nick Stracke, Timy Phan, Björn Ommer What You Have Is What You Track: Adaptive and Robust Multimodal Tracking
Yuedong Tan, Jiawei Shao, Eduard Zamfir, Ruanjun Li, Zhaochong An, Chao Ma, Danda Paudel, Luc Van Gool, Radu Timofte, Zongwei Wu When Anchors Meet Cold Diffusion: A Multi-Stage Approach to Lane Detection
Bo-Lun Huang, Zi-Xiang Ni, Feng-Kai Huang, Hong-Han Shuai, Wen-Huang Cheng When and Where Do Data Poisons Attack Textual Inversion?
Jeremy Styborski, Mingzhi Lyu, Jiayou Lu, Nupur Kapur, Adams Wai-Kin Kong When Schrodinger Bridge Meets Real-World Image Dehazing with Unpaired Training
Yunwei Lan, Zhigao Cui, Xin Luo, Chang Liu, Nian Wang, Menglin Zhang, Yanzhao Su, Dong Liu Where Am I? Cross-View Geo-Localization with Natural Language Descriptions
Junyan Ye, Honglin Lin, Leyan Ou, Dairong Chen, Zihao Wang, Qi Zhu, Conghui He, Weijia Li Where, What, Why: Towards Explainable Driver Attention Prediction
Yuchen Zhou, Jiayu Tang, Xiaoyan Xiao, Yueyao Lin, Linkai Liu, Zipeng Guo, Hao Fei, Xiaobo Xia, Chao Gou Who Is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads
Yingjie Zhou, Jiezhang Cao, Zicheng Zhang, Farong Wen, Yanwei Jiang, Jun Jia, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation
Zhongyu Yang, Jun Chen, Dannong Xu, Junjie Fei, Xiaoqian Shen, Liangbing Zhao, Chun-Mei Feng, Mohamed Elhoseiny WildSAT: Learning Satellite Image Representations from Wildlife Observations
Rangel Daroya, Elijah Cole, Oisin Mac Aodha, Grant Van Horn, Subhransu Maji WINS: Winograd Structured Pruning for Fast Winograd Convolution
Cheonjun Park, Hyun Jae Oh, Mincheol Park, Hyunchan Moon, Minsik Kim, Suhyun Kim, Myung Kuk Yoon, Won Woo Ro WIPES: Wavelet-Based Visual Primitives
Wenhao Zhang, Hao Zhu, Delong Wu, Di Kang, Linchao Bao, Xun Cao, Zhan Ma WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
Zizhang Li, Hong-Xing Yu, Wei Liu, Yin Yang, Charles Herrmann, Gordon Wetzstein, Jiajun Wu WonderTurbo: Generating Interactive 3D World in 0.72 Seconds
Chaojun Ni, Xiaofeng Wang, Zheng Zhu, Weijie Wang, Haoyun Li, Guosheng Zhao, Jie Li, Wenkang Qin, Guan Huang, Wenjun Mei World4Drive: End-to-End Autonomous Driving via Intention-Aware Physical Latent World Model
Yupeng Zheng, Pengxuan Yang, Zebin Xing, Qichao Zhang, Yuhang Zheng, Yinfeng Gao, Pengfei Li, Teng Zhang, Zhongpu Xia, Peng Jia, XianPeng Lang, Dongbin Zhao WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image
Yuci Liang, Xinheng Lyu, Wenting Chen, Meidan Ding, Jipeng Zhang, Xiangjian He, Song Wu, Xiaohan Xing, Sen Yang, Xiyue Wang, Linlin Shen X-Dancer: Expressive Music to Human Dance Video Generation
Zeyuan Chen, Hongyi Xu, Guoxian Song, You Xie, Chenxu Zhang, Xin Chen, Chao Wang, Di Chang, Linjie Luo X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo, Thao Nguyen, Xun Huang, Siddharth Srinivasan Iyer, Yijun Li, Yuchen Liu, Abhishek Tandon, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee, Bolei Zhou, Yuheng Li X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting
Zeyi Sun, Ziyang Chu, Pan Zhang, Tong Wu, Yuhang Zang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang XTrack: Multimodal Training Boosts RGB-X Video Object Trackers
Yuedong Tan, Zongwei Wu, Yuqian Fu, Zhuyun Zhou, Guolei Sun, Eduard Zamfir, Chao Ma, Danda Paudel, Luc Van Gool, Radu Timofte YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Guanning Zeng, Xiang Zhang, Zirui Wang, Haiyang Xu, Zeyuan Chen, Bingnan Li, Zhuowen Tu YOLOE: Real-Time Seeing Anything
Ao Wang, Lihao Liu, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding Your Text Encoder Can Be an Object-Level Watermarking Controller
Naresh Kumar Devulapally, Mingzhen Huang, Vishal Asnani, Shruti Agarwal, Siwei Lyu, Vishnu Suresh Lokhande Zero-Shot Composed Image Retrieval via Dual-Stream Instruction-Aware Distillation
Wenliang Zhong, Rob Barton, Weizhi An, Feng Jiang, Hehuan Ma, Yuzhi Guo, Abhishek Dan, Shioulin Sam, Karim Bouyarmane, Junzhou Huang Zero-Shot Compositional Video Learning with Coding Rate Reduction
Heeseok Jung, Jun-Hyeon Bak, Yujin Jeong, Gyugeun Lee, Jinwoo Ahn, Eun-Sol Kim Zero-Shot Inexact CAD Model Alignment from a Single Image
Pattaramanee Arsomngern, Sasikarn Khwanmuang, Matthias Nießner, Supasorn Suwajanakorn Zero-Shot Vision Encoder Grafting via LLM Surrogates
Kaiyu Yue, Vasu Singla, Menglin Jia, John Kirchenbauer, Rifaa Qadri, Zikui Cai, Abhinav Bhatele, Furong Huang, Tom Goldstein ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models
Bingchen Gong, Diego Gomez, Abdullah Hamdi, Abdelrahman Eldesokey, Ahmed Abdelreheem, Peter Wonka, Maks Ovsjanikov ZeroStereo: Zero-Shot Stereo Matching from Single Images
Xianqi Wang, Hao Yang, Gangwei Xu, Junda Cheng, Min Lin, Yong Deng, Jinliang Zang, Yurui Chen, Xin Yang Zeroth-Order Fine-Tuning of LLMs in Random Subspaces
Ziming Yu, Pan Zhou, Sike Wang, Jia Li, Mi Tian, Hua Huang ZIM: Zero-Shot Image Matting for Anything
Beomyoung Kim, Chanyong Shin, Joonhyun Jeong, Hyungsik Jung, Se-Yun Lee, Sewhan Chun, Dong-Hyun Hwang, Joonsang Yu ZipVL: Accelerating Vision-Language Models Through Dynamic Token Sparsity
Yefei He, Feng Chen, Jing Liu, Wenqi Shao, Hong Zhou, Kaipeng Zhang, Bohan Zhuang ZIUM: Zero-Shot Intent-Aware Adversarial Attack on Unlearned Models
Hyun Jun Yook, Ga San Jhun, Jae Hyun Cho, Min Jeon, Donghyun Kim, Tae Hyung Kim, Youn Kyu Lee