On State Merging in Grammatical Inference: A Statistical Approach for Dealing with Noisy Data
Abstract
In front of modern databases, noise tolerance has become today one of the most studied topics in machine learning. Many algorithms have been suggested for dealing with noisy data in the case of numerical instances, either by filtering them during a preprocess, or by treating them during the induction. However, this research subject remains widely open when one learns from unbounded symbolic sequences, which is the aim in grammatical inference. In this paper, we propose a statistical approach for dealing with noisy data during the inference of automata, by the state merging algorithm RPNI. Our approach is based on a proportion comparison test, which relaxes the merging rule of RPNI without endangering the generalization error. Beyond this relevant framework, we provide some useful theoretical properties about the behavior of our new version of RPN[, called RPNI*. Finally, we describe a large comparative study on several datasets. ICML Proceedings of the Twentieth International Conference on Machine Learning
Cite
Text
Sebban and Janodet. "On State Merging in Grammatical Inference: A Statistical Approach for Dealing with Noisy Data." International Conference on Machine Learning, 2003.Markdown
[Sebban and Janodet. "On State Merging in Grammatical Inference: A Statistical Approach for Dealing with Noisy Data." International Conference on Machine Learning, 2003.](https://mlanthology.org/icml/2003/sebban2003icml-state/)BibTeX
@inproceedings{sebban2003icml-state,
title = {{On State Merging in Grammatical Inference: A Statistical Approach for Dealing with Noisy Data}},
author = {Sebban, Marc and Janodet, Jean-Christophe},
booktitle = {International Conference on Machine Learning},
year = {2003},
pages = {688-695},
url = {https://mlanthology.org/icml/2003/sebban2003icml-state/}
}