Allen-Zhu, Zeyuan
39 publications
ICLR
2025
Physics of Language Models: Part 2.2, How to Learn from Mistakes on Grade-School Math Problems
NeurIPS
2025
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
ICLR
2023
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning