Prediction-Oriented Subsampling from Data Streams
Abstract
Data is often generated in streams, with new observations arriving over time. A key challenge for learning models from data streams is capturing relevant information while keeping computational costs manageable. We explore intelligent data subsampling for offline learning, and argue for an information-theoretic method centred on reducing uncertainty in downstream predictions of interest. Empirically, we demonstrate that this prediction-oriented approach performs better than a previously proposed information-theoretic technique on two widely studied problems. At the same time, we highlight that reliably achieving strong performance in practice requires careful model design.
Cite
Text
Mussati et al. "Prediction-Oriented Subsampling from Data Streams." Proceedings of The 4th Conference on Lifelong Learning Agents, 2025.Markdown
[Mussati et al. "Prediction-Oriented Subsampling from Data Streams." Proceedings of The 4th Conference on Lifelong Learning Agents, 2025.](https://mlanthology.org/collas/2025/mussati2025collas-predictionoriented/)BibTeX
@inproceedings{mussati2025collas-predictionoriented,
title = {{Prediction-Oriented Subsampling from Data Streams}},
author = {Mussati, Benedetta Lavinia and Smith, Freddie Bickford and Rainforth, Tom and Roberts, S},
booktitle = {Proceedings of The 4th Conference on Lifelong Learning Agents},
year = {2025},
pages = {565-580},
volume = {330},
url = {https://mlanthology.org/collas/2025/mussati2025collas-predictionoriented/}
}