Mohtashami, Amirkeivan

9 publications

ICLR 2025 CoTFormer: A Chain of Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi
NeurIPS 2024 DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging Matteo Pagliardini, Amirkeivan Mohtashami, Francois Fleuret, Martin Jaggi
NeurIPS 2024 QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Pashmina Cameron, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman
NeurIPSW 2023 CoTFormer: More Tokens with Attention Make up for Less Depth Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi
ICMLW 2023 Landmark Attention: Random-Access Infinite Context Length for Transformers Amirkeivan Mohtashami, Martin Jaggi
NeurIPS 2023 Random-Access Infinite Context Length for Transformers Amirkeivan Mohtashami, Martin Jaggi
ICML 2023 Special Properties of Gradient Descent with Large Learning Rates Amirkeivan Mohtashami, Martin Jaggi, Sebastian U Stich
AISTATS 2022 Masked Training of Neural Networks with Partial Gradients Amirkeivan Mohtashami, Martin Jaggi, Sebastian Stich
AISTATS 2021 Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates Sebastian Stich, Amirkeivan Mohtashami, Martin Jaggi