Probabilistic DFA Inference Using Kullback-Leibler Divergence and Minimality

Abstract

Probabilistic DFA inference is the problem of inducing a stochastic regular \ngrammar from a positive sample of an unknown language.\nThe ALERGIA algorithm is one of the most successful approaches to \nthis problem.\n In the present work we review this algorithm and explain\nwhy its generalization criterion, a state merging operation, \nis purely local. This characteristic leads to the conclusion that \nthere is no explicit way to bound the divergence\nbetween the distribution defined by the solution and the \ntraining set distribution \n(that is, to control globally the generalization from the training sample).\nIn this paper we present an alternative approach, the MDI algorithm, \nin which the solution is a probabilistic automaton that trades off \nminimal divergence from the training sample and minimal size. \nAn efficient computation of the Kullback-Leibler divergence between \ntwo probabilistic DFAs is described, from which the new learning criterion\nis derived.\nEmpirical results in the domain of language model construction for a travel\ninformation task show that the MDI algorithm significantly \noutperforms ALERGIA.

Cite

Text

Thollard et al. "Probabilistic DFA Inference Using Kullback-Leibler Divergence and Minimality." International Conference on Machine Learning, 2000.

Markdown

[Thollard et al. "Probabilistic DFA Inference Using Kullback-Leibler Divergence and Minimality." International Conference on Machine Learning, 2000.](https://mlanthology.org/icml/2000/thollard2000icml-probabilistic/)

BibTeX

@inproceedings{thollard2000icml-probabilistic,
  title     = {{Probabilistic DFA Inference Using Kullback-Leibler Divergence and Minimality}},
  author    = {Thollard, Franck and Dupont, Pierre and de la Higuera, Colin},
  booktitle = {International Conference on Machine Learning},
  year      = {2000},
  pages     = {975-982},
  url       = {https://mlanthology.org/icml/2000/thollard2000icml-probabilistic/}
}