Extrapolating the Profile of a Finite Population

Abstract

We study a prototypical problem in empirical Bayes. Namely, consider a population consisting of $k$ individuals each belonging to one of $k$ types (some types can be empty). Without any structural restrictions, it is impossible to learn the composition of the full population having observed only a small (random) subsample of size $m = o(k)$. Nevertheless, we show that in the sublinear regime of $m =\omega(k/\log k)$, it is possible to consistently estimate in total variation the \emph{profile} of the population, defined as the empirical distribution of the sizes of each type, which determines many symmetric properties of the population. We also prove that in the linear regime of $m=c k$ for any constant $c$ the optimal rate is $\Theta(1/\log k)$. Our estimator is based on Wolfowitz’s minimum distance method, which entails solving a linear program (LP) of size $k$. We show that there is a single infinite-dimensional LP whose value simultaneously characterizes the risk of the minimum distance estimator and certifies its minimax optimality. The sharp convergence rate is obtained by evaluating this LP using complex-analytic techniques.

Cite

Text

Jana et al. "Extrapolating the Profile of a Finite Population." Conference on Learning Theory, 2020.

Markdown

[Jana et al. "Extrapolating the Profile of a Finite Population." Conference on Learning Theory, 2020.](https://mlanthology.org/colt/2020/jana2020colt-extrapolating/)

BibTeX

@inproceedings{jana2020colt-extrapolating,
  title     = {{Extrapolating the Profile of a Finite Population}},
  author    = {Jana, Soham and Polyanskiy, Yury and Wu, Yihong},
  booktitle = {Conference on Learning Theory},
  year      = {2020},
  pages     = {2011-2033},
  volume    = {125},
  url       = {https://mlanthology.org/colt/2020/jana2020colt-extrapolating/}
}