Meta Learning Text-to-Speech Synthesis in over 7000 Languages

Guardado en:
Detalles Bibliográficos
Publicado en:arXiv.org (Jun 10, 2024), p. n/a
Autor principal: Lux, Florian
Otros Autores: Meyer, Sarina, Behringer, Lyonel, Zalkow, Frank, Do, Phat, Coler, Matt, Habets, Emanuël A P, Vu, Ngoc Thang
Publicado:
Cornell University Library, arXiv.org
Materias:
Acceso en línea:Citation/Abstract
Full text outside of ProQuest
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Resumen:In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech synthesis in languages without any available data. We validate our system's performance through objective measures and human evaluation across a diverse linguistic landscape. By releasing our code and models publicly, we aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology.
ISSN:2331-8422
Fuente:Engineering Database