Consistent Recovery Threshold of Hidden Nearest Neighbor Graphs
Abstract
Motivated by applications such as discovering strong ties in social networks and assembling genome subsequences in biology, we study the problem of recovering a hidden $2k$-nearest neighbor (NN) graph in an $n$-vertex complete graph, whose edge weights are independent and distributed according to $P_n$ for edges in the hidden $2k$-NN graph and $Q_n$ otherwise. The special case of Bernoulli distributions corresponds to a variant of the Watts-Strogatz small-world graph. We focus on two types of asymptotic recovery guarantees as $n\to \infty$: (1) exact recovery: all edges are classified correctly with probability tending to one; (2) almost exact recovery: the expected number of misclassified edges is $o(nk)$. We show that the maximum likelihood estimator achieves (1) exact recovery for $2 \le k \le n^{o(1)}$ if $ \liminf \frac{2\alpha_n}{\log n}>1$; (2) almost exact recovery for $ 1 \le k \le o\left( \frac{\log n}{\log \log n} \right)$ if $ \liminf \frac{kD(P_n||Q_n)}{\log n}>1, $ where $\alpha_n \triangleq -2 \log \int \sqrt{d P_n d Q_n}$ is the Rényi divergence of order $\frac{1}{2}$ and $D(P_n||Q_n)$ is the Kullback-Leibler divergence. Under mild distributional assumptions, these conditions are shown to be information-theoretically necessary for any algorithm to succeed. A key challenge in the analysis is the enumeration of $2k$-NN graphs that differ from the hidden one by a given number of edges. We also analyze several computationally efficient algorithms and provide sufficient conditions under which they achieve exact/almost exact recovery. In particular, we develop a polynomial-time algorithm that attains the threshold for exact recovery under the small-world model.
Cite
Text
Ding et al. "Consistent Recovery Threshold of Hidden Nearest Neighbor Graphs." Conference on Learning Theory, 2020.Markdown
[Ding et al. "Consistent Recovery Threshold of Hidden Nearest Neighbor Graphs." Conference on Learning Theory, 2020.](https://mlanthology.org/colt/2020/ding2020colt-consistent/)BibTeX
@inproceedings{ding2020colt-consistent,
title = {{Consistent Recovery Threshold of Hidden Nearest Neighbor Graphs}},
author = {Ding, Jian and Wu, Yihong and Xu, Jiaming and Yang, Dana},
booktitle = {Conference on Learning Theory},
year = {2020},
pages = {1540-1553},
volume = {125},
url = {https://mlanthology.org/colt/2020/ding2020colt-consistent/}
}