ML Anthology
Authors
Search
About
Tang, Jiaming
4 publications
ICLR
2025
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Guangxuan Xiao
,
Jiaming Tang
,
Jingwei Zuo
,
Junxian Guo
,
Shang Yang
,
Haotian Tang
,
Yao Fu
,
Song Han
ICCV
2025
SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference
Samir Khaki
,
Junxian Guo
,
Jiaming Tang
,
Shang Yang
,
Yukang Chen
,
Konstantinos N. Plataniotis
,
Yao Lu
,
Song Han
,
Zhijian Liu
NeurIPS
2025
Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning
Chaofan Lin
,
Jiaming Tang
,
Shuo Yang
,
Hanshuo Wang
,
Tian Tang
,
Boyu Tian
,
Ion Stoica
,
Song Han
,
Mingyu Gao
ICML
2024
QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Jiaming Tang
,
Yilong Zhao
,
Kan Zhu
,
Guangxuan Xiao
,
Baris Kasikci
,
Song Han