Anticipatory Music Transformer
Abstract
We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence. This work is motivated by problems arising in the control of symbolic music generation. We focus on infilling control tasks, whereby the controls are a subset of the events themselves, and conditional generation completes a sequence of events given the fixed control events. We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset. These models match the performance of autoregressive models for prompted generation, with the additional capability to perform infilling control tasks, including accompaniment. Human evaluators report that an anticipatory model produces accompaniments with similar musicality to even music composed by humans over a 20-second clip.
Cite
Text
Thickstun et al. "Anticipatory Music Transformer." Transactions on Machine Learning Research, 2024.Markdown
[Thickstun et al. "Anticipatory Music Transformer." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/thickstun2024tmlr-anticipatory/)BibTeX
@article{thickstun2024tmlr-anticipatory,
title = {{Anticipatory Music Transformer}},
author = {Thickstun, John and Hall, David Leo Wright and Donahue, Chris and Liang, Percy},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/thickstun2024tmlr-anticipatory/}
}