Extrapolating the Profile of a Finite Population
Abstract
We study a prototypical problem in empirical Bayes. Namely, consider a population consisting of $k$ individuals each belonging to one of $k$ types (some types can be empty). Without any structural restrictions, it is impossible to learn the composition of the full population having observed only a small (random) subsample of size $m = o(k)$. Nevertheless, we show that in the sublinear regime of $m =\omega(k/\log k)$, it is possible to consistently estimate in total variation the \emph{profile} of the population, defined as the empirical distribution of the sizes of each type, which determines many symmetric properties of the population. We also prove that in the linear regime of $m=c k$ for any constant $c$ the optimal rate is $\Theta(1/\log k)$. Our estimator is based on Wolfowitz’s minimum distance method, which entails solving a linear program (LP) of size $k$. We show that there is a single infinite-dimensional LP whose value simultaneously characterizes the risk of the minimum distance estimator and certifies its minimax optimality. The sharp convergence rate is obtained by evaluating this LP using complex-analytic techniques.
Cite
Text
Jana et al. "Extrapolating the Profile of a Finite Population." Conference on Learning Theory, 2020.Markdown
[Jana et al. "Extrapolating the Profile of a Finite Population." Conference on Learning Theory, 2020.](https://mlanthology.org/colt/2020/jana2020colt-extrapolating/)BibTeX
@inproceedings{jana2020colt-extrapolating,
title = {{Extrapolating the Profile of a Finite Population}},
author = {Jana, Soham and Polyanskiy, Yury and Wu, Yihong},
booktitle = {Conference on Learning Theory},
year = {2020},
pages = {2011-2033},
volume = {125},
url = {https://mlanthology.org/colt/2020/jana2020colt-extrapolating/}
}