Kim, Yulhwa
6 publications
NeurIPS
2025
Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning
NeurIPS
2024
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models
ICML
2024
SLEB: Streamlining LLMs Through Redundancy Verification and Elimination of Transformer Blocks