Publications Repository - Gdańsk University of Technology

Page settings

polski
Publications Repository
Gdańsk University of Technology

Treść strony

Analysis of 2D Feature Spaces for Deep Learning-based Speech Recognition

convolutional neural network (CNN) which is a class of deep, feed-forward artificial neural network. We decided to analyze audio signal feature maps, namely spectrograms, linear and Mel-scale cepstrograms, and chromagrams. The choice was made upon the fact that CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in the Lithuanian word recognition task. The spectral analysis led to the highest word recognition rate. Spectral and mel-scale cepstral feature spaces outperform linear cepstra and chroma. The 111-word classification experiment depicts f1 score of 0.99 for spectrum, 0.91 for mel-scale cepstrum , 0.76 for chromagram, and 0.64 for cepstrum feature space on test data set.

Authors

Additional information

DOI
Digital Object Identifier link open in new tab 10.17743/jaes.2018.0066
Category
Publikacja w czasopiśmie
Type
artykuł w czasopiśmie wyróżnionym w JCR
Language
angielski
Publication year
2018

Source: MOSTWiedzy.pl - publication "Analysis of 2D Feature Spaces for Deep Learning-based Speech Recognition" link open in new tab

Portal MOST Wiedzy link open in new tab