Yu, Minlan

4 publications

ICLR 2025 Don’T Stop Me Now: Embedding Based Scheduling for LLMs Rana Shahout, Eran Malach, Chunwei Liu, Weifan Jiang, Minlan Yu, Michael Mitzenmacher
NeurIPS 2025 Fast Inference for Augmented Large Language Models Rana Shahout, Cong Liang, Shiji Xin, Qianru Lao, Yong Cui, Minlan Yu, Michael Mitzenmacher
ICLRW 2025 Faster, Cheaper, Just as Good: Cost- and Latency-Constrained Routing for LLMs Javid Lakha, Minlan Yu, Rana Shahout
ICLRW 2025 Prefix and Output Length-Aware Scheduling for Efficient Online LLM Inference Iñaki Arango, Ayush Noori, Yepeng Huang, Rana Shahout, Minlan Yu