Automatic speaker verification systems are vulnerable to several kinds of spoofing attacks. Some of them can be quite simple – for example, the playback of an eavesdropped recording does not require any specialized equipment nor knowledge, but still may pose a serious threat for a biometric identification module built into an e-banking application. In this paper we follow the recent approach and convert recordings to images, assuming that original voice can be distinguished from its played back version through the analysis of local texture patterns. We propose improvements to the state-of-the-art solution, but also show its severe limitations. This in turn leads to the fundamental question: is it possible to find one set of features which are characteristic for all playback recordings? We look for the answer by performing a series of optimization experiments, but in general the problem remains open.
Authors
Additional information
- DOI
- Digital Object Identifier link open in new tab 10.1007/978-3-319-59162-9_13
- Category
- Publikacja monograficzna
- Type
- rozdział, artykuł w książce - dziele zbiorowym /podręczniku w języku o zasięgu międzynarodowym
- Language
- angielski
- Publication year
- 2017