Language Learning from Stochastic Input

Abstract

Language learning from positive data in the Gold model of inductive inference is investigated in a setting where the data can be modeled as a stochastic process. Specifically, the input strings are assumed to form a sequence of identically distributed, independent random variables, where the distribution depends on the language being presented. A scheme is developed which can be tuned to learn, with probability one, any family of recursive languages, given a recursive enumeration of total indices for the languages in the family and a procedure to compute a lower bound to the probability of occurrence of a given string in a given language. Variations of the scheme work under other assumptions, e.g., if the probabilities of the strings form a monotone sequence with respect to a given enumeration. The learning algorithm is rather simple and appears psychologically plausible. A more sophisticated version of the learner is also developed, based on a probabilistic version of the notion of tell-tale subset. This version yields, as a special case, Angluin's learner for the families of languages that are learnable from all texts (and not just from a set of texts of probability one).

Cite

Text

Kapur and Bilardi. "Language Learning from Stochastic Input." Annual Conference on Computational Learning Theory, 1992. doi:10.1145/130385.130419

Markdown

[Kapur and Bilardi. "Language Learning from Stochastic Input." Annual Conference on Computational Learning Theory, 1992.](https://mlanthology.org/colt/1992/kapur1992colt-language/) doi:10.1145/130385.130419

BibTeX

@inproceedings{kapur1992colt-language,
  title     = {{Language Learning from Stochastic Input}},
  author    = {Kapur, Shyam and Bilardi, Gianfranco},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {1992},
  pages     = {303-310},
  doi       = {10.1145/130385.130419},
  url       = {https://mlanthology.org/colt/1992/kapur1992colt-language/}
}