DMLR 2025
13 papers
Challenge Design Roadmap
Hugo Jair Escalante, Isabelle Guyon, Addison Howard, Walter Reade, Sebastien Treguer Chronicling Germany: An Annotated Historical Newspaper Dataset
Christian Schultze, Niklas Kerkfeld, Kara Kuebart, Princilia Weber, Moritz Wolter, Felix Selgert Constructing Confidence Intervals for “the” Generalization Error – A Comprehensive Benchmark Study
Hannah Schulz-Kümpel, Sebastian Felix Fischer, Roman Hornung, Anne-Laure Boulesteix, Thomas Nagler, Bernd Bischl Data Acquisition: A New Frontier in Data-Centric AI
Lingjiao Chen, Bilge Acun, Newsha Ardalani, Yifan Sun, Feiyang Kang, Hanrui Lyu, Yongchan Kwon, Ruoxi Jia, Carole-Jean Wu, Matei Zaharia, James Zou FlowBench: A Large Scale Benchmark for Flow Simulation over Complex Geometries
Ronak Tali, Ali Rabeh, Cheng-Hau Yang, Mehdi Shadkhah, Samundra Karki, Abhisek Upadhyaya, Suriya Dhakshinamoorthy, Marjan Saadati, Soumik Sarkar, Adarsh Krishnamurthy, Chinmay Hegde, Aditya Balu, Baskar Ganapathysubramanian MONSTER: Monash Scalable Time Series Evaluation Repository
Angus Dempster, Navid Mohammadi Foumani, Chang Wei Tan, Lynn Miller, Amish Mishra, Mahsa Salehi, Charlotte Pelletier, Daniel F. Schmidt, Geoffrey I. Webb SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning
Pu Ren, N. Benjamin Erichson, Junyi Guo, Shashank Subramanian, Omer San, Zarija Lukic, Michael W. Mahoney Text Quality-Based Pruning for Efficient Training of Language Models
Vasu Sharma, Karthik Padthe, Newsha Ardalani, Kushal Tirumala, Russell Howes, Hu Xu, Po-Yao Huang, Daniel Li Chen, Armen Aghajanyan, Gargi Ghosh, Luke Zettlemoyer The FIX Benchmark: Extracting Features Interpretable to eXperts
Helen Jin, Shreya Havaldar, Chaehyeon Kim, Anton Xue, Weiqiu You, Helen Qu, Marco Gatti, Daniel A Hashimoto, Bhuvnesh Jain, Amin Madani, Masao Sako, Lyle Ungar, Eric Wong V-LoL: A Diagnostic Dataset for Visual Logical Learning
Lukas Helff, Wolfgang Stammer, Hikaru Shindo, Devendra Singh Dhami, Kristian Kersting