Mixture of Dynamical Variational Autoencoders for Multi-Source Trajectory Modeling and Separation

Abstract

In this paper, we propose a latent-variable generative model called mixture of dynamical variational autoencoders (MixDVAE) to model the dynamics of a system composed of multiple moving sources. A DVAE model is pre-trained on a single-source dataset to capture the source dynamics. Then, multiple instances of the pre-trained DVAE model are integrated into a multi-source mixture model with a discrete observation-to-source assignment latent variable. The posterior distributions of both the discrete observation-to-source assignment variable and the continuous DVAE variables representing the sources content/position are estimated using the variational expectation-maximization algorithm, leading to multi-source trajectories estimation. We illustrate the versatility of the proposed MixDVAE model on two tasks: a computer vision task, namely multi-object tracking, and an audio processing task, namely single-channel audio source separation. Experimental results show that the proposed method works well on these two tasks, and outperforms several baseline methods.

Cite

Text

Lin et al. "Mixture of Dynamical Variational Autoencoders for Multi-Source Trajectory Modeling and Separation." Transactions on Machine Learning Research, 2023.

Markdown

[Lin et al. "Mixture of Dynamical Variational Autoencoders for Multi-Source Trajectory Modeling and Separation." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/lin2023tmlr-mixture/)

BibTeX

@article{lin2023tmlr-mixture,
  title     = {{Mixture of Dynamical Variational Autoencoders for Multi-Source Trajectory Modeling and Separation}},
  author    = {Lin, Xiaoyu and Girin, Laurent and Alameda-Pineda, Xavier},
  journal   = {Transactions on Machine Learning Research},
  year      = {2023},
  url       = {https://mlanthology.org/tmlr/2023/lin2023tmlr-mixture/}
}