Chen, Yilong
5 publications
ICML
2025
Mixture of Hidden-Dimensions: Not All Hidden-States’ Dimensions Are Needed in Transformer
NeurIPS
2025
Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers
NeurIPS
2024
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion