Domain-Specific Customization for Improving Speech to Text
Sparad:
| I publikationen: | ProQuest Dissertations and Theses (2025) |
|---|---|
| Huvudupphov: | |
| Utgiven: |
ProQuest Dissertations & Theses
|
| Ämnen: | |
| Länkar: | Citation/Abstract Full Text - PDF |
| Taggar: |
Inga taggar, Lägg till första taggen!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3190171649 | ||
| 003 | UK-CbPIL | ||
| 020 | |a 9798310380806 | ||
| 035 | |a 3190171649 | ||
| 045 | 2 | |b d20250101 |b d20251231 | |
| 084 | |a 66569 |2 nlm | ||
| 100 | 1 | |a Tobias, Zubin Mario | |
| 245 | 1 | |a Domain-Specific Customization for Improving Speech to Text | |
| 260 | |b ProQuest Dissertations & Theses |c 2025 | ||
| 513 | |a Dissertation/Thesis | ||
| 520 | 3 | |a The advent of transformer-based models has revolutionized natural language processing, bringing remarkable improvements in tasks like automatic speech recognition (ASR). Inspired by these advancements, this thesis explores the optimization of a transformer-based ASR model to improve transcription accuracy in educational settings, particularly for lecture content. The goal of this research is to provide real-time, high-accuracy captions that enhance accessibility for all students, while offering a cost-effective solution for educators.To assess the potential of domain-specific fine-tuning, Whisper-small underwent two phases of fine-tuning. In the first phase, it was finetuned on carefully selected, publicly available datasets: SpeechColab’s Gigaspeech-XS, AMI Meeting corpus. In the second phase, fine-tuned model was optimized on a self-curated dataset consisting of roughly 10 hours of live lecture recordings collected and assembled by me. Finally, a real-time captioning assistant application was developed to leverage the finetuned model and transcribe speech in real time with live editing capabilities.The optimized Whisper-small model was evaluated against Whisper’s retrained small, medium and large(version 2) counterparts. The evaluation was performed on a clean unseen data prepared by me. The fine-tuned model achieved lower Word Error Rates (WER) of 4.53%, compared to 5.51% and 5.78% for Whisper-Medium and Whisper-Large-V2 respectively. These results demonstrate that fine-tuning a transformer-based ASR model on domain-specific data can significantly enhance its performance in a targeted context, such as live lecture transcription.The findings of this experiment highlight the promise of transformer-based models for improving educational accessibility. From thereon, building an application tailored to live lecture settings, this research contributes to the development of adaptable, low-cost technologies that support inclusive learning environments. The success of this experiment lays the groundwork for future breakthroughs in speech recognition, aiming to make education more accessible for everyone. | |
| 653 | |a Computer science | ||
| 653 | |a Artificial intelligence | ||
| 653 | |a Information science | ||
| 773 | 0 | |t ProQuest Dissertations and Theses |g (2025) | |
| 786 | 0 | |d ProQuest |t ProQuest Dissertations & Theses Global | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3190171649/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3190171649/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |