SpeechOp: Inference-Time Task Composition for Generative Speech Processing

Lovelace, Justin; Kumar, Rithesh; Su, Jiaqi; Chen, Ke; Weinberger, Kilian Q; Jin, Zeyu

SpeechOp: Inference-Time Task Composition for Generative Speech Processing

Justin Lovelace, Rithesh Kumar, Jiaqi Su, Ke Chen, Kilian Q Weinberger, Zeyu Jin

ICLR 2026

/iclr/2026/lovelace2026iclr-speechop/

Abstract

While generative Text-to-Speech (TTS) systems leverage vast "in-the-wild" data to achieve remarkable success, speech-to-speech processing tasks like enhancement face data limitations, which lead data-hungry generative approaches to distort speech content and speaker identity. To bridge this gap, we present SpeechOp, a multi-task latent diffusion model that transforms pre-trained TTS models into a universal speech processor capable of performing a wide range of speech tasks and composing them in novel ways at inference time. By adapting a pre-trained TTS model, SpeechOp inherits a rich understanding of natural speech, accelerating training and improving S2S task quality, while simultaneously enhancing core TTS performance. Finally, we introduce Implicit Task Composition (ITC), a novel pipeline where ASR-derived transcripts (e.g., from Whisper) guide SpeechOp's enhancement via our principled inference-time task composition. ITC achieves state-of-the-art content preservation by robustly combining web-scale speech understanding with SpeechOp's generative capabilities.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Lovelace et al. "SpeechOp: Inference-Time Task Composition for Generative Speech Processing." International Conference on Learning Representations, 2026.

Markdown

[Lovelace et al. "SpeechOp: Inference-Time Task Composition for Generative Speech Processing." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/lovelace2026iclr-speechop/)

BibTeX

@inproceedings{lovelace2026iclr-speechop,
  title     = {{SpeechOp: Inference-Time Task Composition for Generative Speech Processing}},
  author    = {Lovelace, Justin and Kumar, Rithesh and Su, Jiaqi and Chen, Ke and Weinberger, Kilian Q and Jin, Zeyu},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/lovelace2026iclr-speechop/}
}