Shang, Junyuan
6 publications
ICML
2025
Mixture of Hidden-Dimensions: Not All Hidden-States’ Dimensions Are Needed in Transformer
NeurIPS
2024
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
6 publications