Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Guardado en:
| Publicado en: | arXiv.org (Jun 10, 2024), p. n/a |
|---|---|
| Autor principal: | |
| Otros Autores: | , , , , , , |
| Publicado: |
Cornell University Library, arXiv.org
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full text outside of ProQuest |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| Resumen: | In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech synthesis in languages without any available data. We validate our system's performance through objective measures and human evaluation across a diverse linguistic landscape. By releasing our code and models publicly, we aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology. |
|---|---|
| ISSN: | 2331-8422 |
| Fuente: | Engineering Database |