Publication Entry

[Back]

Talks and Poster Presentations (with Proceedings-Entry):

B. Fazekas, A. Schindler, T. Lidy:
"A Multi-modal Deep Neural Network approach to Bird-song identification";
Talk: BirdCLEF, Dublin, Ireland; 2017-09-11 - 2017-09-14; in: "BirdCLEF 2017", (2017), 6 pages.

English abstract:

We present a multi-modal Deep Neural Network (DNN) approach for bird song identification. The presented approach takes both audio samples and metadata as input. The audio is fed into a Convolutional Neural Network (CNN) using four convolutional layers. The additionally provided metadata is processed using fully connected layers. The flattened convolutional layers and the fully connected layer of the metadata are joined and fed into a fully connected layer.
The resulting architecture achieved 2., 3. and 4. rank in the BirdCLEF2017 task in various training configurations.

German abstract:

Wir präsentieren einen multimodalen Deep Neural Network (DNN) -Ansatz zur Identifizierung von Vogelstimmen. Der präsentierte Ansatz verwendet sowohl Audio-Samples als auch Metadaten als Eingabe. Der Audio-Inhalt wird in ein Convolutional Neural Network (CNN) unter Verwendung von vier Faltungsschichten eingespeist. Die zusätzlich bereitgestellten Metadaten werden mit vollständig verbundenen Schichten verarbeitet. Die abgeflachten Faltungsschichten und die vollständig verbundene Ebene der Metadaten werden verbunden und einer vollständig verbundenen Ebene zugeführt.
Die resultierende Architektur erreicht 2., 3. und 4. Platz im BirdCLEF2017 Bewerb in verschiedenen Trainingskonfigurationen.

Keywords:

Neural Networks, Audio Analysis, Audio Retrieval, Birdsong identification

Electronic version of the publication:

http://publik.tuwien.ac.at/files/publik_267107.pdf

Created from the Publication Database of the Vienna University of Technology.