Regression Testing for Wrapper Maintenance

Abstract

Recent work on Internet information integration assumes a library of wrappers, specialized information extraction procedures. Maintaining wrappers is difficult, because the formatting regularities on which they rely often change. The wrapper verification problem is to determine whether a wrapper is correct. Standard regression testing approaches are inappropriate, because both the formatting regularities and a site's underlying content may change. We introduce rapture, a fully-implemented, domain-independent verification algorithm. rapture uses well-motivated heuristics to compute the similarity between a wrapper's expected and observed output. Experiments with 27 actual Internet sites show a substantial performance improvement over standard regression testing. Introduction Systems that integrate heterogeneous information sources have recently received substantial research attention (e.g. (Wiederhold 1996; Knoblock et al. 1998; Levy et al. 1998)). A `movie information' integrator, f...

Cite

Text

Kushmerick. "Regression Testing for Wrapper Maintenance." AAAI Conference on Artificial Intelligence, 1999.

Markdown

[Kushmerick. "Regression Testing for Wrapper Maintenance." AAAI Conference on Artificial Intelligence, 1999.](https://mlanthology.org/aaai/1999/kushmerick1999aaai-regression/)

BibTeX

@inproceedings{kushmerick1999aaai-regression,
  title     = {{Regression Testing for Wrapper Maintenance}},
  author    = {Kushmerick, Nicholas},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {1999},
  pages     = {74-79},
  url       = {https://mlanthology.org/aaai/1999/kushmerick1999aaai-regression/}
}