Locatelli, Acyr

8 publications

ICLR 2025 Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models Laura Ruis, Maximilian Mozes, Juhan Bae, Siddhartha Rao Kamalakara, Dwaraknath Gnaneshwar, Acyr Locatelli, Robert Kirk, Tim Rocktäschel, Edward Grefenstette, Max Bartolo

NeurIPS 2025 Rope to Nope and Back Again: A New Hybrid Attention Strategy Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar, Hangyu Lin, David Cairuz, Phil Blunsom, Acyr Locatelli

ICLR 2025 To Code or Not to Code? Exploring Impact of Code in Pre-Training Viraat Aryabumi, Yixuan Su, Raymond Ma, Adrien Morisot, Ivan Zhang, Acyr Locatelli, Marzieh Fadaee, Ahmet Üstün, Sara Hooker

NeurIPS 2024 BAM! Just like That: Simple and Efficient Parameter Upcycling for Mixture of Experts Qizhen Zhang, Nikolas Gritsch, Dwaraknath Gnaneshwar, Simon Guo, David Cairuz, Bharat Venkitesh, Jakob Foerster, Phil Blunsom, Sebastian Ruder, Ahmet Üstün, Acyr Locatelli

ICMLW 2024 BAM! Just like That: Simple and Efficient Parameter Upcycling for Mixture of Experts Qizhen Zhang, Nikolas Gritsch, Dwaraknath Gnaneshwar, Simon Guo, David Cairuz, Bharat Venkitesh, Jakob Nicolaus Foerster, Phil Blunsom, Sebastian Ruder, Ahmet Üstün, Acyr Locatelli

NeurIPSW 2024 Nexus: Specialization Meets Adaptability for Efficiently Training Mixture of Experts Nikolas Gritsch, Qizhen Zhang, Acyr Locatelli, Sara Hooker, Ahmet Üstün

ICLR 2024 Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning Ted Zadouri, Ahmet Üstün, Arash Ahmadian, Beyza Ermis, Acyr Locatelli, Sara Hooker

NeurIPS 2024 SnapKV: LLM Knows What You Are Looking for Before Generation Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen