Language Learning from Stochastic Input
Abstract
Language learning from positive data in the Gold model of inductive inference is investigated in a setting where the data can be modeled as a stochastic process. Specifically, the input strings are assumed to form a sequence of identically distributed, independent random variables, where the distribution depends on the language being presented. A scheme is developed which can be tuned to learn, with probability one, any family of recursive languages, given a recursive enumeration of total indices for the languages in the family and a procedure to compute a lower bound to the probability of occurrence of a given string in a given language. Variations of the scheme work under other assumptions, e.g., if the probabilities of the strings form a monotone sequence with respect to a given enumeration. The learning algorithm is rather simple and appears psychologically plausible. A more sophisticated version of the learner is also developed, based on a probabilistic version of the notion of tell-tale subset. This version yields, as a special case, Angluin's learner for the families of languages that are learnable from all texts (and not just from a set of texts of probability one).
Cite
Text
Kapur and Bilardi. "Language Learning from Stochastic Input." Annual Conference on Computational Learning Theory, 1992. doi:10.1145/130385.130419Markdown
[Kapur and Bilardi. "Language Learning from Stochastic Input." Annual Conference on Computational Learning Theory, 1992.](https://mlanthology.org/colt/1992/kapur1992colt-language/) doi:10.1145/130385.130419BibTeX
@inproceedings{kapur1992colt-language,
title = {{Language Learning from Stochastic Input}},
author = {Kapur, Shyam and Bilardi, Gianfranco},
booktitle = {Annual Conference on Computational Learning Theory},
year = {1992},
pages = {303-310},
doi = {10.1145/130385.130419},
url = {https://mlanthology.org/colt/1992/kapur1992colt-language/}
}