Regression Testing for Wrapper Maintenance
Abstract
Recent work on Internet information integration assumes a library of wrappers, specialized information extraction procedures. Maintaining wrappers is difficult, because the formatting regularities on which they rely often change. The wrapper verification problem is to determine whether a wrapper is correct. Standard regression testing approaches are inappropriate, because both the formatting regularities and a site's underlying content may change. We introduce rapture, a fully-implemented, domain-independent verification algorithm. rapture uses well-motivated heuristics to compute the similarity between a wrapper's expected and observed output. Experiments with 27 actual Internet sites show a substantial performance improvement over standard regression testing. Introduction Systems that integrate heterogeneous information sources have recently received substantial research attention (e.g. (Wiederhold 1996; Knoblock et al. 1998; Levy et al. 1998)). A `movie information' integrator, f...
Cite
Text
Kushmerick. "Regression Testing for Wrapper Maintenance." AAAI Conference on Artificial Intelligence, 1999.Markdown
[Kushmerick. "Regression Testing for Wrapper Maintenance." AAAI Conference on Artificial Intelligence, 1999.](https://mlanthology.org/aaai/1999/kushmerick1999aaai-regression/)BibTeX
@inproceedings{kushmerick1999aaai-regression,
title = {{Regression Testing for Wrapper Maintenance}},
author = {Kushmerick, Nicholas},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {1999},
pages = {74-79},
url = {https://mlanthology.org/aaai/1999/kushmerick1999aaai-regression/}
}