Yuan, Zhihang
24 publications
NeurIPS
2025
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
ICML
2025
RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization
NeurIPS
2025
SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-Critical Expert Identification
NeurIPSW
2024
LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization