Jin, Yunho

2 publications

NeurIPS 2023 $s^3$: Increasing GPU Utilization During Generative Inference for Higher Throughput Yunho Jin, Chun-Feng Wu, David Brooks, Gu-Yeon Wei
ICMLW 2023 SpeedLimit: Neural Architecture Search for Quantized Transformer Models Yuji Chai, Luke Bailey, Yunho Jin, Glenn Ko, Matthew Karle, David Brooks, Gu-Yeon Wei, H. Kung