Open Problem: Data Selection for Regression Tasks

Abstract

This note proposes a set of open problems concerning data selection in regression tasks. The central question is: given a natural learning rule $\mathcal{A}$ and a selection budget $n$, how well can $\mathcal{A}$ perform when trained on $n$ examples selected from a larger dataset? We present concrete instances of this question in basic regression settings, including mean estimation and linear regression.

Cite

Text

Hanneke et al. "Open Problem: Data Selection for Regression Tasks." Proceedings of Thirty Eighth Conference on Learning Theory, 2025.

Markdown

[Hanneke et al. "Open Problem: Data Selection for Regression Tasks." Proceedings of Thirty Eighth Conference on Learning Theory, 2025.](https://mlanthology.org/colt/2025/hanneke2025colt-open/)

BibTeX

@inproceedings{hanneke2025colt-open,
  title     = {{Open Problem: Data Selection for Regression Tasks}},
  author    = {Hanneke, Steve and Moran, Shay and Shlimovich, Alexander and Yehudayoff, Amir},
  booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory},
  year      = {2025},
  pages     = {6225-6229},
  volume    = {291},
  url       = {https://mlanthology.org/colt/2025/hanneke2025colt-open/}
}