Han, Zhenhua

3 publications

NeurIPS 2025 RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Di Liu, Meng Chen, Baotong Lu, Huiqiang Jiang, Zhenhua Han, Qianxi Zhang, Qi Chen, Chengruidong Zhang, Bailu Ding, Kai Zhang, Chen Chen, Fan Yang, Yuqing Yang, Lili Qiu
NeurIPS 2024 MInference 1.0: Accelerating Pre-Filling for Long-Context LLMs via Dynamic Sparse Attention Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu
ICMLW 2024 MInference: Accelerating Pre-Filling for Long-Context LLMs via Dynamic Sparse Attention Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu