Token-Based Audio Inpainting via Discrete Diffusion

Abstract

Audio inpainting seeks to restore missing segments in degraded recordings. Previous diffusion-based methods exhibit impaired performance when the missing region is large. We introduce the first approach that applies discrete diffusion over tokenized music representations from a pre-trained audio tokenizer, enabling stable and semantically coherent restoration of long gaps. Our method further incorporates two training approaches: a derivative-based regularization loss that enforces smooth temporal dynamics, and a span-based absorbing transition that provides structured corruption during diffusion. Experiments on the MusicNet and MAESTRO datasets with gaps up to 750ms show that our approach consistently outperforms strong baselines across range of gap lengths, for gaps of 150ms and above. This work advances musical audio restoration and introduces new directions for discrete diffusion model training. Visit our project page for examples and code.

Cite

Text

Dror et al. "Token-Based Audio Inpainting via Discrete Diffusion." International Conference on Learning Representations, 2026.

Markdown

[Dror et al. "Token-Based Audio Inpainting via Discrete Diffusion." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/dror2026iclr-tokenbased/)

BibTeX

@inproceedings{dror2026iclr-tokenbased,
  title     = {{Token-Based Audio Inpainting via Discrete Diffusion}},
  author    = {Dror, Tali and Shoham, Iftach and Buchris, Moshe and Gal, Oren and Permuter, Haim H. and Katz, Gilad and Nachmani, Eliya},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/dror2026iclr-tokenbased/}
}