Investigating Feature Spaces for Isolated Word Recognition

Povilas Treigys; Grazina Korvel; Gintautas Tamulevicius; Jolita Bernataviciene; Bożena Kostek

doi:10.1007/978-3-030-39250-5

The study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and three feature spaces were investigated for the frequency domain, namely: Linear Prediction Coefficient (LPC) spectrum, Hartley spectrum, and cochleagram. Due to the fact that deep learning requires an adequate training set size of the corpus and its content may significantly influence the outcome, thus for the data augmentation purpose, the created dataset was extended with mixes of the speech signal with noise with various SNRs (Signal-to-Noise Ratio). In order to evaluate the applicability of the implemented feature spaces for isolated word recognition task, three experiments were conducted, i.e., 10-, 70-, and 111-word cases were analyzed.

Authors

dr Povilas Treigys,
Grazina Korvel,
dr Gintautas Tamulevicius,
dr Jolita Bernataviciene,
prof. dr hab. inż. Bożena Kostek link open in new tab

Additional information

DOI: Digital Object Identifier link open in new tab 10.1007/978-3-030-39250-5
Category: Publikacja monograficzna
Type: rozdział, artykuł w książce - dziele zbiorowym /podręczniku w języku o zasięgu międzynarodowym
Language: angielski
Publication year: 2020

Source: MOSTWiedzy.pl - publication "Investigating Feature Spaces for Isolated Word Recognition" link open in new tab

link open in new tab

Publications Repository - Gdańsk University of Technology

Treść strony