Intoner: For Chinese Poetry Intoning Synthesis
Abstract
Chinese Poetry Intoning, with improvised melodies devoid of fixed musical scores, is crucial for emotional expression and prosodic rendition. However, this cultural heritage faces challenges in propagation due to scant audio records and a scarcity of domain experts. Existing text-to-speech models lack the ability to generate melodious audio, while singing-voice-synthesis models rely on predetermined musical scores, which are all unsuitable for intoning synthesis. Hence, we introduce Chinese Poetry Intoning Synthesis (PIS) as a novel task to reproduce intoning audio and preserve this age-old cultural art. Corresponding to this task, we summarize three-level principles from poetry metrical patterns and construct a diffusion PIS model Intoner based on them. We also collect a multi-style Chinese poetry intoning dataset of text-audio pairs accompanied by feature annotations. Experimental results show that our model effectively learns diverse intoning styles and contents which can synthesize more melodious and vibrant intoning audio. To the best of our knowledge, we are the first to work on poetry intoning synthesis task.
Cite
Text
Zuo et al. "Intoner: For Chinese Poetry Intoning Synthesis." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/1139Markdown
[Zuo et al. "Intoner: For Chinese Poetry Intoning Synthesis." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/zuo2025ijcai-intoner/) doi:10.24963/IJCAI.2025/1139BibTeX
@inproceedings{zuo2025ijcai-intoner,
title = {{Intoner: For Chinese Poetry Intoning Synthesis}},
author = {Zuo, Heda and Sun, Liyao and Lai, Zeyu and You, Weitao and Chen, Pei and Sun, Lingyun},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {10252-10260},
doi = {10.24963/IJCAI.2025/1139},
url = {https://mlanthology.org/ijcai/2025/zuo2025ijcai-intoner/}
}