Vocalic Segments Classification Assisted by Mouth Motion Capture

Sebastian Cygert; Grzegorz Szwoch; Szymon Zaporowski; Andrzej Czyżewski

doi:10.1109/hsi.2018.8430943

Visual features convey important information for automatic speech recognition (ASR), especially in noisy environment. The purpose of this study is to evaluate to what extent visual data (i.e. lip reading) can enhance recognition accuracy in the multi-modal approach. For that purpose motion capture markers were placed on speakers' faces to obtain lips tracking data during speaking. Different parameterizations strategies were tested and the accuracy of phonemes recognition in different experiments was analyzed. The obtained results and further challenges related to the bi-modal feature extraction process and decision systems employment are discussed.

Authors

Additional information

DOI: Digital Object Identifier link open in new tab 10.1109/hsi.2018.8430943
Category: Aktywność konferencyjna
Type: publikacja w wydawnictwie zbiorowym recenzowanym (także w materiałach konferencyjnych)
Language: angielski
Publication year: 2018

Source: MOSTWiedzy.pl - publication "Vocalic Segments Classification Assisted by Mouth Motion Capture" link open in new tab

link open in new tab

Publications Repository - Gdańsk University of Technology

Treść strony