Domain-Specific Customization for Improving Speech to Text
Salvato in:
| Pubblicato in: | ProQuest Dissertations and Theses (2025) |
|---|---|
| Autore principale: | |
| Pubblicazione: |
ProQuest Dissertations & Theses
|
| Soggetti: | |
| Accesso online: | Citation/Abstract Full Text - PDF |
| Tags: |
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| Abstract: | The advent of transformer-based models has revolutionized natural language processing, bringing remarkable improvements in tasks like automatic speech recognition (ASR). Inspired by these advancements, this thesis explores the optimization of a transformer-based ASR model to improve transcription accuracy in educational settings, particularly for lecture content. The goal of this research is to provide real-time, high-accuracy captions that enhance accessibility for all students, while offering a cost-effective solution for educators.To assess the potential of domain-specific fine-tuning, Whisper-small underwent two phases of fine-tuning. In the first phase, it was finetuned on carefully selected, publicly available datasets: SpeechColab’s Gigaspeech-XS, AMI Meeting corpus. In the second phase, fine-tuned model was optimized on a self-curated dataset consisting of roughly 10 hours of live lecture recordings collected and assembled by me. Finally, a real-time captioning assistant application was developed to leverage the finetuned model and transcribe speech in real time with live editing capabilities.The optimized Whisper-small model was evaluated against Whisper’s retrained small, medium and large(version 2) counterparts. The evaluation was performed on a clean unseen data prepared by me. The fine-tuned model achieved lower Word Error Rates (WER) of 4.53%, compared to 5.51% and 5.78% for Whisper-Medium and Whisper-Large-V2 respectively. These results demonstrate that fine-tuning a transformer-based ASR model on domain-specific data can significantly enhance its performance in a targeted context, such as live lecture transcription.The findings of this experiment highlight the promise of transformer-based models for improving educational accessibility. From thereon, building an application tailored to live lecture settings, this research contributes to the development of adaptable, low-cost technologies that support inclusive learning environments. The success of this experiment lays the groundwork for future breakthroughs in speech recognition, aiming to make education more accessible for everyone. |
|---|---|
| ISBN: | 9798310380806 |
| Fonte: | ProQuest Dissertations & Theses Global |