SegGen: A Genetic Algorithm for Linear Text Segmentation
Abstract
This paper describes SegGen, a new algorithm for linear text segmentation on general corpuses. It aims to segment texts into thematic homogeneous parts. Several existing methods have been used for this purpose, based on a sequential creation of boundaries. Here, we propose to consider boundaries simultaneously thanks to a genetic algorithm. SegGen uses two criteria: maximization of the internal cohesion of the formed segments and minimization of the similarity of the adjacent segments. First experimental results are promising and SegGen appears to be very competitive compared with existing methods.
Cite
Text
Lamprier et al. "SegGen: A Genetic Algorithm for Linear Text Segmentation." International Joint Conference on Artificial Intelligence, 2007.Markdown
[Lamprier et al. "SegGen: A Genetic Algorithm for Linear Text Segmentation." International Joint Conference on Artificial Intelligence, 2007.](https://mlanthology.org/ijcai/2007/lamprier2007ijcai-seggen/)BibTeX
@inproceedings{lamprier2007ijcai-seggen,
title = {{SegGen: A Genetic Algorithm for Linear Text Segmentation}},
author = {Lamprier, Sylvain and Amghar, Tassadit and Levrat, Bernard and Saubion, Frédéric},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2007},
pages = {1647-1652},
url = {https://mlanthology.org/ijcai/2007/lamprier2007ijcai-seggen/}
}