The aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based system for automatic speech conversion, mimicking a human way to make speech more intelligible in the presence of noise, i.e., to create Lombard speech. Several spectral descriptors are analyzed in the context of Lombard speech and various types of noise. In conclusion, pub-like and babble noises are most similar when comparing Spectral Entropy, Spectral RollOff, and Spectral Brightness. The larger values of these spectral descriptors, the more the speech-in-noise signal is degraded. To quantify the effect of noise on speech, containing the Lombard effect, an average formant track error is calculated as an objective image quality metric. For image quality assessment Structural SIMilarity (SSIM) index is employed.
Authors
- Grazina Korvel,
- mgr inż. Krzysztof Kąkol,
- dr Povilas Treigys,
- prof. dr hab. inż. Bożena Kostek link open in new tab
Additional information
- DOI
- Digital Object Identifier link open in new tab 10.1007/978-3-031-16564-1_38
- Category
- Aktywność konferencyjna
- Type
- publikacja w wydawnictwie zbiorowym recenzowanym (także w materiałach konferencyjnych)
- Language
- angielski
- Publication year
- 2022