Low-Rank Time-Frequency Synthesis

Abstract

Many single-channel signal decomposition techniques rely on a low-rank factorization of a time-frequency transform. In particular, nonnegative matrix factorization (NMF) of the spectrogram -- the (power) magnitude of the short-time Fourier transform (STFT) -- has been considered in many audio applications. In this setting, NMF with the Itakura-Saito divergence was shown to underly a generative Gaussian composite model (GCM) of the STFT, a step forward from more empirical approaches based on ad-hoc transform and divergence specifications. Still, the GCM is not yet a generative model of the raw signal itself, but only of its STFT. The work presented in this paper fills in this ultimate gap by proposing a novel signal synthesis model with low-rank time-frequency structure. In particular, our new approach opens doors to multi-resolution representations, that were not possible in the traditional NMF setting. We describe two expectation-maximization algorithms for estimation in the new model and report audio signal processing results with music decomposition and speech enhancement.

Cite

Text

Févotte and Kowalski. "Low-Rank Time-Frequency Synthesis." Neural Information Processing Systems, 2014.

Markdown

[Févotte and Kowalski. "Low-Rank Time-Frequency Synthesis." Neural Information Processing Systems, 2014.](https://mlanthology.org/neurips/2014/fevotte2014neurips-lowrank/)

BibTeX

@inproceedings{fevotte2014neurips-lowrank,
  title     = {{Low-Rank Time-Frequency Synthesis}},
  author    = {Févotte, Cédric and Kowalski, Matthieu},
  booktitle = {Neural Information Processing Systems},
  year      = {2014},
  pages     = {3563-3571},
  url       = {https://mlanthology.org/neurips/2014/fevotte2014neurips-lowrank/}
}