Mishra, Mayank

9 publications

ICML 2025 Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication Overlapping Muru Zhang, Mayank Mishra, Zhongzhu Zhou, William Brandon, Jue Wang, Yoon Kim, Jonathan Ragan-Kelley, Shuaiwen Leon Song, Ben Athiwaratkun, Tri Dao
NeurIPS 2025 PaTH Attention: Position Encoding via Accumulating Householder Transformations Songlin Yang, Yikang Shen, Kaiyue Wen, Shawn Tan, Mayank Mishra, Liliang Ren, Rameswar Panda, Yoon Kim
ICML 2024 BRAIn: Bayesian Reward-Conditioned Amortized Inference for Natural Language Generation from Feedback Gaurav Pandey, Yatin Nandwani, Tahira Naseem, Mayank Mishra, Guangxuan Xu, Dinesh Raghu, Sachindra Joshi, Asim Munawar, Ramón Fernandez Astudillo
CVPR 2024 DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets Harsh Rangwani, Pradipto Mondal, Mayank Mishra, Ashish Ramayee Asokan, R. Venkatesh Babu
NeurIPS 2024 Reducing Transformer Key-Value Cache Size with Cross-Layer Attention William Brandon, Mayank Mishra, Aniruddha Nrusimha, Rameswar Panda, Jonathan Ragan-Kelley
TMLR 2023 StarCoder: May the Source Be with You! Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Joel Lamy-Poirier, Joao Monteiro, Nicolas Gontier, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Ben Lipkin, Muhtasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason T Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Urvashi Bhattacharyya, Wenhao Yu, Sasha Luccioni, Paulo Villegas, Fedor Zhdanov, Tony Lee, Nadav Timor, Jennifer Ding, Claire S Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Muñoz Ferrandis, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro Von Werra, Harm de Vries
ICML 2022 A Closer Look at Smoothness in Domain Adversarial Training Harsh Rangwani, Sumukh K Aithal, Mayank Mishra, Arihant Jain, Venkatesh Babu Radhakrishnan
NeurIPS 2022 Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data Harsh Rangwani, Sumukh K Aithal, Mayank Mishra, Venkatesh Babu R
IJCAI 2022 Variational Learning for Unsupervised Knowledge Grounded Dialogs Mayank Mishra, Dhiraj Madan, Gaurav Pandey, Danish Contractor