[Back]


Talks and Poster Presentations (with Proceedings-Entry):

T. Lidy, A. Schindler:
"CQT-based convolutional neural networks for audio scene classification";
Poster: Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), Budapest, Hungary; 2016-09-03; in: "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016)", (2016), ISBN: 978-952-15-3807-0; 5 pages.



English abstract:
In this paper, we propose a parallel Convolutional Neural Network
architecture for the task of classifying acoustic scenes and urban
sound scapes. A popular choice for input to a Convolutional Neu-
ral Network in audio classification problems are Mel-transformed
spectrograms. We, however, show in this paper that a Constant-
Q-transformed input improves results. Furthermore, we evaluated
critical parameters such as the number of necessary bands and filter
sizes in a Convolutional Neural Network. These are non-trivial in
audio tasks due to the different semantics of the two axes of the in-
put data: time vs. frequency. Finally, we propose a parallel (graph-
based) neural network architecture which captures relevant audio
characteristics both in time and in frequency. Our approach shows a
10.7 % relative improvement of the baseline system of the DCASE
2016 Acoustic Scenes Classification task [1].

Keywords:
Neural networks, Deep learning, Classification, Audio, Audio Event Classification, Convolutional Neural Networks, CQT, Constant-Q-Transform, Mel Spectrogram


Electronic version of the publication:
http://publik.tuwien.ac.at/files/publik_256002.pdf


Created from the Publication Database of the Vienna University of Technology.