ML Anthology
Authors
Search
About
Xiao, Guangxuan
11 publications
ICLR
2025
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Guangxuan Xiao
,
Jiaming Tang
,
Jingwei Zuo
,
Junxian Guo
,
Shang Yang
,
Haotian Tang
,
Yao Fu
,
Song Han
ICLR
2025
Retrieval Head Mechanistically Explains Long-Context Factuality
Wenhao Wu
,
Yizhong Wang
,
Guangxuan Xiao
,
Hao Peng
,
Yao Fu
ICML
2025
XAttention: Block Sparse Attention with Antidiagonal Scoring
Ruyi Xu
,
Guangxuan Xiao
,
Haofeng Huang
,
Junxian Guo
,
Song Han
NeurIPS
2024
BitDelta: Your Fine-Tune May Only Be Worth One Bit
James Liu
,
Guangxuan Xiao
,
Kai Li
,
Jason D. Lee
,
Song Han
,
Tri Dao
,
Tianle Cai
ICLR
2024
Efficient Streaming Language Models with Attention Sinks
Guangxuan Xiao
,
Yuandong Tian
,
Beidi Chen
,
Song Han
,
Mike Lewis
NeurIPS
2024
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
Chaojun Xiao
,
Pengle Zhang
,
Xu Han
,
Guangxuan Xiao
,
Yankai Lin
,
Zhengyan Zhang
,
Zhiyuan Liu
,
Maosong Sun
ICMLW
2024
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
Chaojun Xiao
,
Pengle Zhang
,
Xu Han
,
Guangxuan Xiao
,
Yankai Lin
,
Zhengyan Zhang
,
Zhiyuan Liu
,
Maosong Sun
ICML
2024
QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Jiaming Tang
,
Yilong Zhao
,
Kan Zhu
,
Guangxuan Xiao
,
Baris Kasikci
,
Song Han
ICML
2023
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Guangxuan Xiao
,
Ji Lin
,
Mickael Seznec
,
Hao Wu
,
Julien Demouth
,
Song Han
LoG
2022
Sparse and Local Networks for Hypergraph Reasoning
Guangxuan Xiao
,
Leslie Pack Kaelbling
,
Jiajun Wu
,
Jiayuan Mao
ICMLW
2021
Red Alarm for Pre-Trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks
Zhengyan Zhang
,
Guangxuan Xiao
,
Yongwei Li
,
Tian Lv
,
Fanchao Qi
,
Zhiyuan Liu
,
Yasheng Wang
,
Xin Jiang
,
Maosong Sun