Robust Efficient Conditional Probability Estimation

Abstract

The problem is finding a general, robust, and efficient mechanism for estimating a conditional probability P (y|x) where robustness and efficiency are measured using techniques from learning reductions. In particular, suppose we have access to a binary regression oracle B which has two interfaces—one for specifying training information and one for testing. Training information is specified as B(x ′ , y ′ ) where x ′ is an unspecified feature vector and y ′ ∈ [0, 1] is a bounded range scalar with no value returned. This operation is stateful, possibly altering the return value of the testing interface in arbitrary ways. Testing is done according to B(x ′ ) with a value in [0, 1] returned. The testing operation operation is stateless. A learning reduction consists of two algorithms R and R −1. The algorithm R takes as input a single example (x, y) where x is a feature vector and y ∈ 1,..., k is a discrete variable. R then specifies a training example (x ′ , y ′ ) for the oracle B. R can then create another training example for B based on all available information. This process repeats some finite number of times before halting without returning information. A basic observation is that for any oracle algorithm, a distribution D(x, y) over multiclass examples and a reduction R induces a distribution over a sequence (x ′ , y ′ ) ∗ of oracle examples. We collapse this into a distribution D ′ (x ′ , y ′ ) over oracle examples by drawing uniformly from the sequence. The algorithm R −1 takes as input a single example (x, y) and returns a value v ∈ [0, 1] after using (only) the testing interface of B zero or more times. We measure the power of an oracle and a reduction according to squared-loss regret according to: and similarly letting µx ′ = E (x ′,y ′)∼D ′[y ′]. reg(D, R −1) = E (x, y)∼D[(R −1 (x, y) − D(y|x)) 2] reg(D ′ , B) = E (x ′,y ′)∼D ′(B(x ′ ) − µx ′)2 The open problem is to specify R and R −1 satisfying the following theorem: Theorem 1 For all multiclass distributions D(x, y), for all binary oracles B: The computational complexity of R and R −1 are O(log k) and reg(D, R −1) ≤ Creg(D ′ , B) where C is a universal constant. Alternatively, this open problem is satisfied by proving there exists no deterministic algorithms R, R −1 satisfying the above theorem statement.

Cite

Text

Langford. "Robust Efficient Conditional Probability Estimation." Annual Conference on Computational Learning Theory, 2010. doi:10.7326/0003-4819-156-10-201205150-02006

Markdown

[Langford. "Robust Efficient Conditional Probability Estimation." Annual Conference on Computational Learning Theory, 2010.](https://mlanthology.org/colt/2010/langford2010colt-robust/) doi:10.7326/0003-4819-156-10-201205150-02006

BibTeX

@inproceedings{langford2010colt-robust,
  title     = {{Robust Efficient Conditional Probability Estimation}},
  author    = {Langford, John},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2010},
  pages     = {316-317},
  doi       = {10.7326/0003-4819-156-10-201205150-02006},
  url       = {https://mlanthology.org/colt/2010/langford2010colt-robust/}
}