LLM2Fx-Tools: Tool Calling for Music Post-Production

Abstract

This paper introduces LLM2Fx-Tools, a multimodal tool-calling framework that generates executable sequences of audio effects (Fx-chain) for music post-production. LLM2Fx-Tools uses a large language model (LLM) to understand audio inputs, select audio effects types, determine their order, and estimate parameters, guided by chain-of-thought (CoT) planning. We also present LP-Fx, a new instruction-following dataset with structured CoT annotations and tool calls for audio effects modules. Experiments show that LLM2Fx-Tools can infer an Fx-chain and its parameters from pairs of unprocessed and processed audio, enabled by autoregressive sequence modeling, tool calling, and CoT reasoning. We further validate the system in a style transfer setting, where audio effects information is transferred from a reference source and applied to new content. Finally, LLM-as-a-judge evaluation demonstrates that our approach generates appropriate CoT reasoning and responses for music production queries. To our knowledge, this is the first work to apply LLM-based tool calling to audio effects modules, enabling interpretable and controllable music production.

Cite

Text

Doh et al. "LLM2Fx-Tools: Tool Calling for Music Post-Production." International Conference on Learning Representations, 2026.

Markdown

[Doh et al. "LLM2Fx-Tools: Tool Calling for Music Post-Production." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/doh2026iclr-llm2fxtools/)

BibTeX

@inproceedings{doh2026iclr-llm2fxtools,
  title     = {{LLM2Fx-Tools: Tool Calling for Music Post-Production}},
  author    = {Doh, SeungHeon and Koo, Junghyun and Martínez-Ramírez, Marco A. and Choi, Woosung and Liao, Wei-Hsiang and Wu, Qiyu and Nam, Juhan and Mitsufuji, Yuki},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/doh2026iclr-llm2fxtools/}
}