Open Problem: Data Selection for Regression Tasks
Abstract
This note proposes a set of open problems concerning data selection in regression tasks. The central question is: given a natural learning rule $\mathcal{A}$ and a selection budget $n$, how well can $\mathcal{A}$ perform when trained on $n$ examples selected from a larger dataset? We present concrete instances of this question in basic regression settings, including mean estimation and linear regression.
Cite
Text
Hanneke et al. "Open Problem: Data Selection for Regression Tasks." Proceedings of Thirty Eighth Conference on Learning Theory, 2025.Markdown
[Hanneke et al. "Open Problem: Data Selection for Regression Tasks." Proceedings of Thirty Eighth Conference on Learning Theory, 2025.](https://mlanthology.org/colt/2025/hanneke2025colt-open/)BibTeX
@inproceedings{hanneke2025colt-open,
title = {{Open Problem: Data Selection for Regression Tasks}},
author = {Hanneke, Steve and Moran, Shay and Shlimovich, Alexander and Yehudayoff, Amir},
booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory},
year = {2025},
pages = {6225-6229},
volume = {291},
url = {https://mlanthology.org/colt/2025/hanneke2025colt-open/}
}