Prediction-Oriented Subsampling from Data Streams

Abstract

Data is often generated in streams, with new observations arriving over time. A key challenge for learning models from data streams is capturing relevant information while keeping computational costs manageable. We explore intelligent data subsampling for offline learning, and argue for an information-theoretic method centred on reducing uncertainty in downstream predictions of interest. Empirically, we demonstrate that this prediction-oriented approach performs better than a previously proposed information-theoretic technique on two widely studied problems. At the same time, we highlight that reliably achieving strong performance in practice requires careful model design.

Cite

Text

Mussati et al. "Prediction-Oriented Subsampling from Data Streams." Proceedings of The 4th Conference on Lifelong Learning Agents, 2025.

Markdown

[Mussati et al. "Prediction-Oriented Subsampling from Data Streams." Proceedings of The 4th Conference on Lifelong Learning Agents, 2025.](https://mlanthology.org/collas/2025/mussati2025collas-predictionoriented/)

BibTeX

@inproceedings{mussati2025collas-predictionoriented,
  title     = {{Prediction-Oriented Subsampling from Data Streams}},
  author    = {Mussati, Benedetta Lavinia and Smith, Freddie Bickford and Rainforth, Tom and Roberts, S},
  booktitle = {Proceedings of The 4th Conference on Lifelong Learning Agents},
  year      = {2025},
  pages     = {565-580},
  volume    = {330},
  url       = {https://mlanthology.org/collas/2025/mussati2025collas-predictionoriented/}
}