Mușat, Tiberiu

2 publications

ICLR 2025 Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers Tiberiu Mușat
NeurIPSW 2024 Clustering and Alignment: Understanding the Training Dynamics in Modular Addition Tiberiu Mușat