Reducing footprint of unit selection TTS system by excluding utterances from source speech corpus

Matoušek, Jindřich; Tihelka, Daniel; Hanzlíček, Zdeněk

Title:	Reducing footprint of unit selection TTS system by excluding utterances from source speech corpus
Other Titles:	Snižování paměťových nároků systému TTS pracujícího na principu výběru jednotek vyhozením promluv ze zdrojového řečového korpusu
Authors:	Matoušek, Jindřich Tihelka, Daniel Hanzlíček, Zdeněk
Citation:	MATOUŠEK, Jindřich; TIHELKA, Daniel; HANZLÍČEK, Zdeněk. Reducing footprint of unit selection TTS system by excluding utterances from source speech corpus. In: Speech processing 19th czech â€“ german workshop 29th September â€“ 1st October 2009. Prague: Institute of Photonics and Electronics Academy of Sciences of the Czech Republic, 2009, p. 92-98. ISBN 978-80-86269-18-4.
Issue Date:	2009
Publisher:	Institute of Photonics and Electronics Academy of Sciences of the Czech Republic
Document type:	článek article
URI:	http://www.kky.zcu.cz/cs/publications/MatousekJ_2009_ReducingFootprintof http://hdl.handle.net/11025/17017
ISBN:	978-80-86269-18-4
Keywords:	syntéza řeči;výběr jednotky;korpus řeči
Keywords in different language:	speech synthesis;unit selection;speech corpus
Abstract in different language:	Current unit selection speech synthesis systems are capable of producing speech of a high quality at the expense of enormous computational and storage requirements. In this paper, the analysis of an existing large speech corpus employed for unit-selection-based synthesis of Czech speech is performed. Subsequently, a procedure for the exclusion of some amount of utterances from the source speech corpus is proposed. The procedure is based on the statistics of the utilisation of all utterances during text-to-speech synthesis of a large portion of texts. The exclusion of whole utterances was preferred over the exclusion of the particular instances of speech units in order to preserve the main feature of unit selection framework - to select as longest sequence of contiguous speech units as possible. After the exclusion, the footprint of the system was reduced approximately by 42 %. The resulting synthetic speech was then judged by means of 5-scale CCR listening tests and evaluated in average as only "slightly worse" than speech generated by the baseline (i.e. not reduced) system.
Rights:	© Jindřich Matoušek - Daniel Tihelka - Zdeněk Hanzlíček
Appears in Collections:	Články / Articles (NTIS) Články / Articles (KIV)

Files in This Item:

File	Description	Size	Format
MatousekJ_2009_ReducingFootprintof.pdf	Plný text	742,59 kB	Adobe PDF	View/Open

Show full item record

Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/17017

search

navigation