Zhou, Zhenghan

1 publications

ICLR 2026 DynamicInfer: Runtime-Aware Sparse Offloading for LLMs Inference on a Consumer-Grade GPU Zhui Zhu, Weichen Zhang, Zhenghan Zhou, Yunhao Liu, Fan Dang