Beyond Limits: Enhancing the Extrapolation Performance of Regression Models by Leaving the Boundary Out

Garcia, Francisco Ambrosio; Naets, Frank

doi:10.1007/S10994-025-06933-8

Beyond Limits: Enhancing the Extrapolation Performance of Regression Models by Leaving the Boundary Out

Francisco Ambrosio Garcia, Frank Naets

MLJ 2025 pp. 285

doi:10.1007/S10994-025-06933-8 /mlj/2025/garcia2025mlj-beyond/

Abstract

While standard data splitting methods assume that the training and testing data are drawn from the same distribution, real-world engineering applications frequently require making predictions in unseen input regions. For example, to optimize design parameters of a certain system, surrogate regression models can be used to map the systems’ design parameters to the system’s performance, and used to predict the performance of new configurations, a task that involves significant extrapolation. Despite the critical importance in such contexts, extrapolation has received surprisingly little attention in the literature. This paper seeks to reduce the gap in the prediction performance between the interpolation and extrapolation regimes of machine learning regression models in engineering, such that the quality of the predictions in extrapolation approaches the quality in interpolation. For that, we introduce leave-boundary-out (LBO), a data splitting method intended to identify models with superior extrapolation performance to the boundaries of the training distribution that exploits enhanced hyperparameter sensitivity in out-of-distribution regions. We first validate LBO in synthetic experiments, then verify its effectiveness across four real-world use cases in mechanical, process, and materials engineering. LBO consistently enhances the extrapolation performance of regression methods in comparison with a typical data splitting approach, albeit with a trade-off in interpolation. Our key findings reveal that: (i) models developed with extrapolation considerations in the data splitting generalize better beyond the training distributions, (ii) these models are most of the times simpler than those selected by a standard approach, and (iii) LBO reduces extrapolation performance gaps between localized regression methods (radial basis function kernel-based in our experiments) and trend search-based methods, in principle better suited for extrapolation. Our results highlight the importance of incorporating extrapolation requirements into the design and evaluation of machine learning regression models for engineering applications in real-world scenarios.

PDF MLJ Semantic Scholar

Cite

Text

Garcia and Naets. "Beyond Limits: Enhancing the Extrapolation Performance of Regression Models by Leaving the Boundary Out." Machine Learning, 2025. doi:10.1007/S10994-025-06933-8

Markdown

[Garcia and Naets. "Beyond Limits: Enhancing the Extrapolation Performance of Regression Models by Leaving the Boundary Out." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/garcia2025mlj-beyond/) doi:10.1007/S10994-025-06933-8

BibTeX

@article{garcia2025mlj-beyond,
  title     = {{Beyond Limits: Enhancing the Extrapolation Performance of Regression Models by Leaving the Boundary Out}},
  author    = {Garcia, Francisco Ambrosio and Naets, Frank},
  journal   = {Machine Learning},
  year      = {2025},
  pages     = {285},
  doi       = {10.1007/S10994-025-06933-8},
  volume    = {114},
  url       = {https://mlanthology.org/mlj/2025/garcia2025mlj-beyond/}
}