Open Problem: Tight Characterization of Instance-Optimal Identity Testing
Abstract
In the “instance-optimal” identity testing introduced by Valiant and Valiant (2014), one is given the (succinct) description of a discrete probability distribution $q$, as well as a a parameter $\varepsilon\in(0,1]$ and i.i.d. samples from an (unknown, arbitrary) discrete distribution $p$. The goal is to distinguish with high probability between the cases (i) $p=q$ and (ii) $\textrm{TV}(p,q) > \varepsilon$, using the minimum number of samples possible as a function of (some simple functional of) $q$ and $\varepsilon$. This is in contrast with the standard formulation of identity testing, where the sample complexity is taken as worst-case over all possible reference distributions $q$. Valiant and Valiant provided upper and lower bounds on this question, where the sample complexity is expressed in terms of the “$\ell_{2/3}$ norm” of some (truncated version) of the reference distribution $q$. However, these upper and lower bounds do not always match up to constant factors, and can differ by an arbitrary multiplicative gap for some choices of $q$. The question then is: what is the tight characterization of the sample complexity of instance-optimal identity testing? What is the “right” functional $\Phi(q)$ for it?
Cite
Text
Canonne. "Open Problem: Tight Characterization of Instance-Optimal Identity Testing." Conference on Learning Theory, 2024.Markdown
[Canonne. "Open Problem: Tight Characterization of Instance-Optimal Identity Testing." Conference on Learning Theory, 2024.](https://mlanthology.org/colt/2024/canonne2024colt-open/)BibTeX
@inproceedings{canonne2024colt-open,
title = {{Open Problem: Tight Characterization of Instance-Optimal Identity Testing}},
author = {Canonne, Clément},
booktitle = {Conference on Learning Theory},
year = {2024},
pages = {5312-5316},
volume = {247},
url = {https://mlanthology.org/colt/2024/canonne2024colt-open/}
}