Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages

Furkejuvvon:
Bibliográfalaš dieđut
Publikašuvnnas:arXiv.org (May 30, 2023), p. n/a
Váldodahkki: Do, Phat
Eará dahkkit: Coler, Matt, Dijkstra, Jelske, Klabbers, Esther
Almmustuhtton:
Cornell University Library, arXiv.org
Fáttát:
Liŋkkat:Citation/Abstract
Full text outside of ProQuest
Fáddágilkorat: Lasit fáddágilkoriid
Eai fáddágilkorat, Lasit vuosttaš fáddágilkora!

MARC

LEADER 00000nab a2200000uu 4500
001 2821493327
003 UK-CbPIL
022 |a 2331-8422 
035 |a 2821493327 
045 0 |b d20230530 
100 1 |a Do, Phat 
245 1 |a Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages 
260 |b Cornell University Library, arXiv.org  |c May 30, 2023 
513 |a Working Paper 
520 3 |a We train a MOS prediction model based on wav2vec 2.0 using the open-access data sets BVCC and SOMOS. Our test with neural TTS data in the low-resource language (LRL) West Frisian shows that pre-training on BVCC before fine-tuning on SOMOS leads to the best accuracy for both fine-tuned and zero-shot prediction. Further fine-tuning experiments show that using more than 30 percent of the total data does not lead to significant improvements. In addition, fine-tuning with data from a single listener shows promising system-level accuracy, supporting the viability of one-participant pilot tests. These findings can all assist the resource-conscious development of TTS for LRLs by progressing towards better zero-shot MOS prediction and informing the design of listening tests, especially in early-stage evaluation. 
653 |a Prediction models 
653 |a Speech recognition 
700 1 |a Coler, Matt 
700 1 |a Dijkstra, Jelske 
700 1 |a Klabbers, Esther 
773 0 |t arXiv.org  |g (May 30, 2023), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/2821493327/abstract/embedded/75I98GEZK8WCJMPQ?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2305.19396