Domain-Specific Customization for Improving Speech to Text

Sparad:
Bibliografiska uppgifter
I publikationen:ProQuest Dissertations and Theses (2025)
Huvudupphov: Tobias, Zubin Mario
Utgiven:
ProQuest Dissertations & Theses
Ämnen:
Länkar:Citation/Abstract
Full Text - PDF
Taggar: Lägg till en tagg
Inga taggar, Lägg till första taggen!

MARC

LEADER 00000nab a2200000uu 4500
001 3190171649
003 UK-CbPIL
020 |a 9798310380806 
035 |a 3190171649 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Tobias, Zubin Mario 
245 1 |a Domain-Specific Customization for Improving Speech to Text 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a The advent of transformer-based models has revolutionized natural language processing, bringing remarkable improvements in tasks like automatic speech recognition (ASR). Inspired by these advancements, this thesis explores the optimization of a transformer-based ASR model to improve transcription accuracy in educational settings, particularly for lecture content. The goal of this research is to provide real-time, high-accuracy captions that enhance accessibility for all students, while offering a cost-effective solution for educators.To assess the potential of domain-specific fine-tuning, Whisper-small underwent two phases of fine-tuning. In the first phase, it was finetuned on carefully selected, publicly available datasets: SpeechColab’s Gigaspeech-XS, AMI Meeting corpus. In the second phase, fine-tuned model was optimized on a self-curated dataset consisting of roughly 10 hours of live lecture recordings collected and assembled by me. Finally, a real-time captioning assistant application was developed to leverage the finetuned model and transcribe speech in real time with live editing capabilities.The optimized Whisper-small model was evaluated against Whisper’s retrained small, medium and large(version 2) counterparts. The evaluation was performed on a clean unseen data prepared by me. The fine-tuned model achieved lower Word Error Rates (WER) of 4.53%, compared to 5.51% and 5.78% for Whisper-Medium and Whisper-Large-V2 respectively. These results demonstrate that fine-tuning a transformer-based ASR model on domain-specific data can significantly enhance its performance in a targeted context, such as live lecture transcription.The findings of this experiment highlight the promise of transformer-based models for improving educational accessibility. From thereon, building an application tailored to live lecture settings, this research contributes to the development of adaptable, low-cost technologies that support inclusive learning environments. The success of this experiment lays the groundwork for future breakthroughs in speech recognition, aiming to make education more accessible for everyone. 
653 |a Computer science 
653 |a Artificial intelligence 
653 |a Information science 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3190171649/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3190171649/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch