The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech

Guardat en:
Dades bibliogràfiques
Publicat a:arXiv.org (Jun 1, 2023), p. n/a
Autor principal: Do, Phat
Altres autors: Coler, Matt, Dijkstra, Jelske, Klabbers, Esther
Publicat:
Cornell University Library, arXiv.org
Matèries:
Accés en línia:Citation/Abstract
Full text outside of ProQuest
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC

LEADER 00000nab a2200000uu 4500
001 2821741190
003 UK-CbPIL
022 |a 2331-8422 
035 |a 2821741190 
045 0 |b d20230601 
100 1 |a Do, Phat 
245 1 |a The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech 
260 |b Cornell University Library, arXiv.org  |c Jun 1, 2023 
513 |a Working Paper 
520 3 |a We compare phone labels and articulatory features as input for cross-lingual transfer learning in text-to-speech (TTS) for low-resource languages (LRLs). Experiments with FastSpeech 2 and the LRL West Frisian show that using articulatory features outperformed using phone labels in both intelligibility and naturalness. For LRLs without pronunciation dictionaries, we propose two novel approaches: a) using a massively multilingual model to convert grapheme-to-phone (G2P) in both training and synthesizing, and b) using a universal phone recognizer to create a makeshift dictionary. Results show that the G2P approach performs largely on par with using a ground-truth dictionary and the phone recognition approach, while performing generally worse, remains a viable option for LRLs less suitable for the G2P approach. Within each approach, using articulatory features as input outperforms using phone labels. 
653 |a Labels 
653 |a Learning 
653 |a Dictionaries 
653 |a Intelligibility 
653 |a Speech recognition 
700 1 |a Coler, Matt 
700 1 |a Dijkstra, Jelske 
700 1 |a Klabbers, Esther 
773 0 |t arXiv.org  |g (Jun 1, 2023), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/2821741190/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2306.00535