Balancing Gaussian Vectors in High Dimension

Abstract

Motivated by problems in controlled experiments, we study the discrepancy of random matrices with continuous entries where the number of columns $n$ is much larger than the number of rows $m$. Our first result shows that if $\omega(1) = m = o(n)$, a matrix with i.i.d. standard Gaussian entries has discrepancy $\Theta(\sqrt{n} \, 2^{-n/m})$ with high probability. This provides sharp guarantees for Gaussian discrepancy in a regime that had not been considered before in the existing literature. Our results also apply to a more general family of random matrices with continuous i.i.d. entries, assuming that $m = O(n/\log{n})$. The proof is non-constructive and is an application of the second moment method. Our second result is algorithmic and applies to random matrices whose entries are i.i.d. and have a Lipschitz density. We present a randomized polynomial-time algorithm that achieves discrepancy $e^{-\Omega(\log^2(n)/m)}$ with high probability, provided that $m = O(\sqrt{\log{n}})$. In the one-dimensional case, this matches the best known algorithmic guarantees due to Karmarkar–Karp. For higher dimensions $2 \leq m = O(\sqrt{\log{n}})$, this establishes the first efficient algorithm achieving discrepancy smaller than $O( \sqrt{m} )$.

Cite

Text

Turner et al. "Balancing Gaussian Vectors in High Dimension." Conference on Learning Theory, 2020.

Markdown

[Turner et al. "Balancing Gaussian Vectors in High Dimension." Conference on Learning Theory, 2020.](https://mlanthology.org/colt/2020/turner2020colt-balancing/)

BibTeX

@inproceedings{turner2020colt-balancing,
  title     = {{Balancing Gaussian Vectors in High Dimension}},
  author    = {Turner, Paxton and Meka, Raghu and Rigollet, Philippe},
  booktitle = {Conference on Learning Theory},
  year      = {2020},
  pages     = {3455-3486},
  volume    = {125},
  url       = {https://mlanthology.org/colt/2020/turner2020colt-balancing/}
}