Weber, Maurice

4 publications

ICLR 2025 Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation Linda He, Jue Wang, Maurice Weber, Shang Zhu, Ben Athiwaratkun, Ce Zhang
NeurIPS 2024 RedPajama: An Open Dataset for Training Large Language Models Maurice Weber, Daniel Y. Fu, Quentin Anthony, Yonatan Oren, Shane Adams, Anton Alexandrov, Xiaozhong Lyu, Huu Nguyen, Xiaozhe Yao, Virginia Adams, Ben Athiwaratkun, Rahul Chalamala, Kezhen Chen, Max Ryabinin, Tri Dao, Percy Liang, Christopher RĂ©, Irina Rish, Ce Zhang
NeurIPS 2023 WordScape: A Pipeline to Extract Multilingual, Visually Rich Documents with Layout Annotations from Web Crawl Data Maurice Weber, Carlo Siebenschuh, Rory Butler, Anton Alexandrov, Valdemar Thanner, Georgios Tsolakis, Haris Jabbar, Ian Foster, Bo Li, Rick Stevens, Ce Zhang
NeurIPS 2022 Certifying Some Distributional Fairness with Subpopulation Decomposition Mintong Kang, Linyi Li, Maurice Weber, Yang Liu, Ce Zhang, Bo Li