Xiao, Guangxuan

11 publications

ICLR 2025 DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Guangxuan Xiao, Jiaming Tang, Jingwei Zuo, Junxian Guo, Shang Yang, Haotian Tang, Yao Fu, Song Han
ICLR 2025 Retrieval Head Mechanistically Explains Long-Context Factuality Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, Yao Fu
ICML 2025 XAttention: Block Sparse Attention with Antidiagonal Scoring Ruyi Xu, Guangxuan Xiao, Haofeng Huang, Junxian Guo, Song Han
NeurIPS 2024 BitDelta: Your Fine-Tune May Only Be Worth One Bit James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai
ICLR 2024 Efficient Streaming Language Models with Attention Sinks Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis
NeurIPS 2024 InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun
ICMLW 2024 InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun
ICML 2024 QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference Jiaming Tang, Yilong Zhao, Kan Zhu, Guangxuan Xiao, Baris Kasikci, Song Han
ICML 2023 SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, Song Han
LoG 2022 Sparse and Local Networks for Hypergraph Reasoning Guangxuan Xiao, Leslie Pack Kaelbling, Jiajun Wu, Jiayuan Mao
ICMLW 2021 Red Alarm for Pre-Trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks Zhengyan Zhang, Guangxuan Xiao, Yongwei Li, Tian Lv, Fanchao Qi, Zhiyuan Liu, Yasheng Wang, Xin Jiang, Maosong Sun