Inconsistency Biases in Dynamic Data Pruning

Abstract

Dynamic data pruning accelerates training by focusing on informative samples. However, comparing importance scores across different model states introduces inconsistency (score context drift), and variable selection rates bias gradient dynamics over time (temporal gradient bias). We introduce RePB (Resolving Pruning Biases), a framework addressing these issues. RePB performs pruning decisions within local windows (short sequences of batches) during training, using loss scores computed with a near-constant model state within each window to ensure valid comparisons. These decisions determine the data subset used in the subsequent training phase. To counteract temporal gradient bias arising from non-uniform sample inclusion, cumulative temporal rescaling reweights sample losses during training based on their historical selection frequency. We provide theoretical grounding for RePB's consistency in score comparison and gradient alignment. Experiments show RePB achieves near-full-dataset accuracy using reduced data (most above 30%) across 16 datasets, 17 models and 13 tasks, offering a robust and scalable approach to efficient deep learning. Code is available at https://github.com/mrazhou/RePB.

Cite

Text

Zhou et al. "Inconsistency Biases in Dynamic Data Pruning." International Conference on Learning Representations, 2026.

Markdown

[Zhou et al. "Inconsistency Biases in Dynamic Data Pruning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhou2026iclr-inconsistency/)

BibTeX

@inproceedings{zhou2026iclr-inconsistency,
  title     = {{Inconsistency Biases in Dynamic Data Pruning}},
  author    = {Zhou, Qing and Yang, Tao and Zhao, Bingxuan and Zhang, Hongyuan and Gao, Junyu and Wang, Qi},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhou2026iclr-inconsistency/}
}