FlowSynth: Simplifying Complex Audio Generation Through Explorable Latent Spaces with Normalizing Flows

Abstract

Audio synthesizers are pervasive in modern music production. These highly complex audio generation functions provide a unique diversity through their large sets of parameters. However, this feature also can make them extremely hard and obfuscated to use, especially for non-expert users with no formal knowledge on signal processing. We recently introduced a novel formalization of the problem of synthesizer control as learning an invertible mapping between an audio latent space, extracted from the audio signal, and a target parameter latent space, extracted from the synthesizer's presets, using normalizing flows. In addition to model a continuous representation allowing to ease the intuitive exploration of the synthesizer, it also provides a ground-breaking method for audio-based parameter inference, vocal control and macro-control learning. Here, we discuss the details of integrating these high-level features to develop new interaction schemes between a human user and the generating device: parameters inference from audio, high-level preset visualization and interpolation, that can be used both in off-time and real-time situations. Moreover, we also leverage LeapMotion devices to allow the control of hundreds of parameters simply by moving one hand across space to explore the low-dimensional latent space, allowing to both empower and facilitate the user's interaction with the synthesizer.

Cite

Text

Esling et al. "FlowSynth: Simplifying Complex Audio Generation Through Explorable Latent Spaces with Normalizing Flows." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/767

Markdown

[Esling et al. "FlowSynth: Simplifying Complex Audio Generation Through Explorable Latent Spaces with Normalizing Flows." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/esling2020ijcai-flowsynth/) doi:10.24963/IJCAI.2020/767

BibTeX

@inproceedings{esling2020ijcai-flowsynth,
  title     = {{FlowSynth: Simplifying Complex Audio Generation Through Explorable Latent Spaces with Normalizing Flows}},
  author    = {Esling, Philippe and Masuda, Naotake and Chemla-Romeu-Santos, Axel},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {5273-5275},
  doi       = {10.24963/IJCAI.2020/767},
  url       = {https://mlanthology.org/ijcai/2020/esling2020ijcai-flowsynth/}
}