Learning Features of Music from Scratch

Abstract

This paper introduces a new large-scale music dataset, MusicNet, to serve as a source of supervision and evaluation of machine learning methods for music research. MusicNet consists of hundreds of freely-licensed classical music recordings by 10 composers, written for 11 instruments, together with instrument/note annotations resulting in over 1 million temporal labels on 34 hours of chamber music performances under various studio and microphone conditions. The paper defines a multi-label classification task to predict notes in musical recordings, along with an evaluation protocol, and benchmarks several machine learning architectures for this task: i) learning from spectrogram features; ii) end-to-end learning with a neural net; iii) end-to-end learning with a convolutional neural net. These experiments show that end-to-end models trained for note prediction learn frequency selective filters as a low-level representation of audio.

Cite

Text

Thickstun et al. "Learning Features of Music from Scratch." International Conference on Learning Representations, 2017.

Markdown

[Thickstun et al. "Learning Features of Music from Scratch." International Conference on Learning Representations, 2017.](https://mlanthology.org/iclr/2017/thickstun2017iclr-learning/)

BibTeX

@inproceedings{thickstun2017iclr-learning,
  title     = {{Learning Features of Music from Scratch}},
  author    = {Thickstun, John and Harchaoui, Zaïd and Kakade, Sham M.},
  booktitle = {International Conference on Learning Representations},
  year      = {2017},
  url       = {https://mlanthology.org/iclr/2017/thickstun2017iclr-learning/}
}