Corpus Analysis for Revision-Based Generation of Complex Sentences
Abstract
The complex sentences of newswire reports contain floating content units that appear to be opportunistically placed where the form of the surrounding text allows. We present a corpus analysis that identified precise semantic and syntactic constraints on where and how such information is realized. The result is a set of revision tools that form the rule base for a report generation system, allowing incremental generation of complex sentences. Introduction Generating reports that summarize quantitative data raises several challenges for language generation systems. First, sentences in such reports are very complex (e.g., in newswire basketball game summaries the lead sentence ranges from 21 to 46 words in length). Second, while some content units consistently appear in fixed locations across reports (e.g., game results are always conveyed in the lead sentence), others float, appearing anywhere in a report and at different linguistic ranks within a given sentence. Floating content uni...
Cite
Text
Robin and McKeown. "Corpus Analysis for Revision-Based Generation of Complex Sentences." AAAI Conference on Artificial Intelligence, 1993.Markdown
[Robin and McKeown. "Corpus Analysis for Revision-Based Generation of Complex Sentences." AAAI Conference on Artificial Intelligence, 1993.](https://mlanthology.org/aaai/1993/robin1993aaai-corpus/)BibTeX
@inproceedings{robin1993aaai-corpus,
title = {{Corpus Analysis for Revision-Based Generation of Complex Sentences}},
author = {Robin, Jacques and McKeown, Kathleen R.},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {1993},
pages = {365-372},
url = {https://mlanthology.org/aaai/1993/robin1993aaai-corpus/}
}