Speakers Talking Foreign Languages in a Multi-lingual TTS System

Hanzlíček, Zdeněk; Vít, Jakub; Řezáčková, Markéta

Title:	Speakers Talking Foreign Languages in a Multi-lingual TTS System
Other Titles:	Řečnící hovořící cizími jazyky ve vícejazyčném systému syntézy řeči
Authors:	Hanzlíček, Zdeněk Vít, Jakub Řezáčková, Markéta
Citation:	HANZLÍČEK, Z. VÍT, J. ŘEZÁČKOVÁ, M. Speakers Talking Foreign Languages in a Multi-lingual TTS System. In Text, Speech, and Dialogue 24th International Conference, TSD 2021, Olomouc, Czech Republic, September 6–9, 2021, Proceedings. Cham: Springer International Publishing, 2021. s. 489-498. ISBN: 978-3-030-83526-2 , ISSN: 0302-9743
Issue Date:	2021
Publisher:	Springer International Publishing
Document type:	konferenční příspěvek ConferenceObject
URI:	2-s2.0-85115270565 http://hdl.handle.net/11025/47246
ISBN:	978-3-030-83526-2
ISSN:	0302-9743
Keywords:	syntéza řeči;vícejazyčné systémy syntézy řeči
Keywords in different language:	Speech synthesis;Multi-lingual TTS
Abstract:	Článek popisuje experimenty s vícejazyčnými systémy syntézy řeči trénovanými společně z anglických, německých, ruských a českých dat. Experimentální systém založený na LSTM neuronových sítích a trénovatelný neuronový vokodér využívají mezinárodní fonetickou abecedu (IPA), což umožňuje přímou kombinaci různých jazyků. Článek porovnává, zda je společný model schopný spojit a zobecnit informaci obsaženou v trénovacích datech a zda je možné použít jednotlivé hlasy k syntéze jiných jazyků, včetně hlásek specifických pro jednotlivé jazyky. Srozumitelnost generované řeči byla ohodnocena s využitím SUS poslechových testů. Vícejazyčné modely byly rovněž porovnány s nezávislými jednojazyčními modely, kde chybějící cizí hlásky byly nahrazeny nejpodobnějšími hláskami přítomnými v daném jazyku. V poslechových testech byly jednoznačně preferovány vícejazyčné modely.
Abstract in different language:	This paper presents experiments with a multi-lingual multi-speaker TTS synthesis system jointly trained on English, German, Russian, and Czech speech data. The experimental LSTM-based TTS system with a trainable neural vocoder utilizes the International Phonetic Alphabet (IPA) which allows a straight combination of different languages. We analyzed whether the joint model is capable to generalize and mix the information contained in the training data and whether particular voices can be used for the synthesis of different languages, including the language-specific phonemes. The intelligibility of generated speech was assessed by an SUS (Semantically Unpredictable Sentences) listening tests containing Czech sentences spoken by non-Czech speakers. The performance of the joint multi-lingual model was also compared with independent single-voice models where the missing non-native phonemes were mapped to the most similar native phonemes. Besides the Czech sentences, the preference test also contained the English sentences spoken by Czech voices. The multi-lingual model was preferred for all evaluated voices. Although the generated speech did not sound like a native speaker, the phonetic and prosodic features were definitely better.
Rights:	Plný text je přístupný v rámci univerzity přihlášeným uživatelům. © Springer
Appears in Collections:	Konferenční příspěvky / Conference Papers (KKY) OBD

Files in This Item:

File	Size	Format
Hanzlíček2021_Chapter_SpeakersTalkingForeignLanguage.pdf	283,93 kB	Adobe PDF	View/Open Request a copy

Show full item record

Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/47246

search

navigation