QA-MDT: Quality-Aware Masked Diffusion Transformer for Enhanced Music Generation
Abstract
Text-to-music (TTM) generation, which converts textual descriptions into audio, opens up innovative avenues for multimedia creation. Achieving high quality and diversity in this process demands extensive, high-quality data, which are often scarce in available datasets. Most open-source datasets frequently suffer from issues like low-quality waveforms and low text-audio consistency, hindering the advancement of music generation models. To address these challenges, we propose a novel quality-aware training paradigm for generating high-quality, high-musicality music from large-scale, quality-imbalanced datasets. Additionally, by leveraging unique properties in the latent space of musical signals, we adapt and implement a masked diffusion transformer (MDT) model for the TTM task, showcasing its capacity for quality control and enhanced musicality. Furthermore, we introduce a three-stage caption refinement approach to address low-quality captions' issue. Experiments show state-of-the-art (SOTA) performance on benchmark datasets including MusicCaps and the Song-Describer Dataset with both objective and subjective metrics. Demo audio samples are available at https://qa-mdt.github.io/, code and pretrained checkpoints are open-sourced at https://github.com/ivcylc/OpenMusic.
Cite
Text
Li et al. "QA-MDT: Quality-Aware Masked Diffusion Transformer for Enhanced Music Generation." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/1126Markdown
[Li et al. "QA-MDT: Quality-Aware Masked Diffusion Transformer for Enhanced Music Generation." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/li2025ijcai-qa/) doi:10.24963/IJCAI.2025/1126BibTeX
@inproceedings{li2025ijcai-qa,
title = {{QA-MDT: Quality-Aware Masked Diffusion Transformer for Enhanced Music Generation}},
author = {Li, Chang and Wang, Ruoyu and Liu, Lijuan and Du, Jun and Sun, Yixuan and Guo, Zilu and Zhang, Zhengrong and Jiang, Yuan and Gao, Jianqing and Ma, Feng},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {10135-10143},
doi = {10.24963/IJCAI.2025/1126},
url = {https://mlanthology.org/ijcai/2025/li2025ijcai-qa/}
}