This work aims at showing how the well-known analysis-by-synthesis paradigm has recently been evolved into a new concept. However, in contrast to the original idea stating that the created sound should not fail to pass the foolproof synthesis test, the recent development is a consequence of the need to create new data. Deep learning models are greedy algorithms requiring a vast amount of data that, in addition, should be correctly annotated. Annotation is a bottleneck to getting quality-reliable data as the process relies on annotating a person’s experience and, in many cases, personality related issues. So, the new approach is to create synthesized data based on a thorough analytical examination of a musical/speech signal resulting in cues for a deep model of how to populate data to overcome this problem. Typically, a 2D feature space is employed, e.g., mel spectrograms, cepstrograms, chromagrams, etc., or a wave-based representation with the counterpart on the algorithmic side called wavenet. In this paper, examples of 2D musical/ speech signal representation are presented, along with deep models applied. Creating new data in the ontext of applications is also shown. In conclusion, further possible directions of this paradigm development which is now beyond the conceptual phase, are presented.
Authors
Additional information
- DOI
- Digital Object Identifier link open in new tab 10.1121/10.0015955
- Category
- Publikacja w czasopiśmie
- Type
- artykuły w czasopismach
- Language
- angielski
- Publication year
- 2022