Xu, Yuhui

9 publications

ICML 2025 Reward-Guided Speculative Decoding for Efficient LLM Reasoning Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong
NeurIPS 2025 Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis Tianbao Xie, Jiaqi Deng, Xiaochuan Li, Junlin Yang, Haoyuan Wu, Jixuan Chen, Wenjing Hu, Xinyuan Wang, Yuhui Xu, Zekun Wang, Yiheng Xu, Junli Wang, Doyen Sahoo, Tao Yu, Caiming Xiong
ICLR 2025 ThinK: Thinner Key Cache by Query-Driven Pruning Yuhui Xu, Zhanming Jie, Hanze Dong, Lei Wang, Xudong Lu, Aojun Zhou, Amrita Saha, Caiming Xiong, Doyen Sahoo
ICLR 2024 QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Yuhui Xu, Lingxi Xie, Xiaotao Gu, Xin Chen, Heng Chang, Hengheng Zhang, Zhengsu Chen, Xiaopeng Zhang, Qi Tian
ICML 2024 SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models Xudong Lu, Aojun Zhou, Yuhui Xu, Renrui Zhang, Peng Gao, Hongsheng Li
AAAI 2021 Fitting the Search Space of Weight-Sharing NAS with Graph Convolutional Networks Xin Chen, Lingxi Xie, Jun Wu, Longhui Wei, Yuhui Xu, Qi Tian
ICLR 2020 PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, Hongkai Xiong
IJCAI 2020 TRP: Trained Rank Pruning for Efficient Deep Neural Networks Yuhui Xu, Yuxi Li, Shuai Zhang, Wei Wen, Botao Wang, Yingyong Qi, Yiran Chen, Weiyao Lin, Hongkai Xiong
AAAI 2018 Deep Neural Network Compression with Single and Multiple Level Quantization Yuhui Xu, Yongzhuang Wang, Aojun Zhou, Weiyao Lin, Hongkai Xiong