Active Data Collection and Management for Real-World Continual Learning via Pretrained Oracle
Abstract
Incremental Learning (IL) deals with learning from continuous streams of data while minimising catastrophic forgetting. This field of Machine Learning (ML) research has introduced several novel approaches and methodologies for varying configurations. However, academic Continual Learning setups generally work with well-curated datasets under predefined conditions, which do not hold for practical applications. In real-world scenarios, the problem of ML starts with data collection and curation. Depending on the application, different challenges are posed w.r.t. data management, such as similar objects, unbalanced data containing sparse samples, visual artefacts, digitisation, and camera setup. This becomes an incrementally compounding issue in Continual Learning projects with data drift and varying conditions. We propose Active Data Collection and Management (ADCM), a straight-forward and effective general framework for data collection, coreset/exemplar selection, and analysis. A pretrained Oracle model provides ground truth distribution for the other model that learns incrementally. We couple ADCM with traditional ML/IL setups and demonstrate its suitability for real-world tasks, such as fine-grained classification and anomaly detection. A baseline implementation of ADCM for Class-IL matches state-of-the-art exemplar selection strategies, providing an improvement in average incremental accuracy of 1.5% with Dynamically Expandable Representation (DER) and 4.1% with PODNet against Herding, and 0.8% on old class data against Reinforced Memory Management (RMM); and shows improved performance for general coreset selection. Our code is available at: https://github.com/Vivek9Chavan/ADCM
Cite
Text
Chavan et al. "Active Data Collection and Management for Real-World Continual Learning via Pretrained Oracle." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00412Markdown
[Chavan et al. "Active Data Collection and Management for Real-World Continual Learning via Pretrained Oracle." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/chavan2024cvprw-active/) doi:10.1109/CVPRW63382.2024.00412BibTeX
@inproceedings{chavan2024cvprw-active,
title = {{Active Data Collection and Management for Real-World Continual Learning via Pretrained Oracle}},
author = {Chavan, Vivek and Koch, Paul and Schlüter, Marian and Briese, Clemens and Krüger, Jörg},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2024},
pages = {4085-4096},
doi = {10.1109/CVPRW63382.2024.00412},
url = {https://mlanthology.org/cvprw/2024/chavan2024cvprw-active/}
}