Subkhankulov, Marat

1 publications

ICLR 2026 Small Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and Implications for Mechanistic Interpretability Luca Baroni, Galvin Khara, Joachim Schaeffer, Marat Subkhankulov, Stefan Heimersheim