Evaluating Representations by the Complexity of Learning Low-Loss Predictors
Abstract
We consider the problem of evaluating representations of data for use in solving a downstream task. We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest. To this end, we introduce two measures: surplus description length (SDL) and $\varepsilon$ sample complexity ($\varepsilon$SC). To compare our methods to prior work, we also present a framework based on plotting the validation loss versus evaluation dataset size (the "loss-data" curve). Existing measures, such as mutual information and minimum description length, correspond to slices and integrals along the data axis of the loss-data curve, while ours correspond to slices and integrals along the loss axis. This analysis shows that prior methods measure properties of an evaluation dataset of a specified size, whereas our methods measure properties of a predictor with a specified loss. We conclude with experiments on real data to compare the behavior of these methods over datasets of varying size.
Cite
Text
Whitney et al. "Evaluating Representations by the Complexity of Learning Low-Loss Predictors." ICLR 2021 Workshops: Neural_Compression, 2021.Markdown
[Whitney et al. "Evaluating Representations by the Complexity of Learning Low-Loss Predictors." ICLR 2021 Workshops: Neural_Compression, 2021.](https://mlanthology.org/iclrw/2021/whitney2021iclrw-evaluating/)BibTeX
@inproceedings{whitney2021iclrw-evaluating,
title = {{Evaluating Representations by the Complexity of Learning Low-Loss Predictors}},
author = {Whitney, William F and Song, Min Jae and Brandfonbrener, David and Altosaar, Jaan and Cho, Kyunghyun},
booktitle = {ICLR 2021 Workshops: Neural_Compression},
year = {2021},
url = {https://mlanthology.org/iclrw/2021/whitney2021iclrw-evaluating/}
}