Domain-Specific Customization for Improving Speech to Text

Guardat en:

Dades bibliogràfiques
Publicat a:	ProQuest Dissertations and Theses (2025)
Autor principal:	Tobias, Zubin Mario
Publicat:	ProQuest Dissertations & Theses
Matèries:	Computer science Artificial intelligence Information science
Accés en línia:	Citation/Abstract Full Text - PDF
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

Descripció
Resum:	The advent of transformer-based models has revolutionized natural language processing, bringing remarkable improvements in tasks like automatic speech recognition (ASR). Inspired by these advancements, this thesis explores the optimization of a transformer-based ASR model to improve transcription accuracy in educational settings, particularly for lecture content. The goal of this research is to provide real-time, high-accuracy captions that enhance accessibility for all students, while offering a cost-effective solution for educators.To assess the potential of domain-specific fine-tuning, Whisper-small underwent two phases of fine-tuning. In the first phase, it was finetuned on carefully selected, publicly available datasets: SpeechColab’s Gigaspeech-XS, AMI Meeting corpus. In the second phase, fine-tuned model was optimized on a self-curated dataset consisting of roughly 10 hours of live lecture recordings collected and assembled by me. Finally, a real-time captioning assistant application was developed to leverage the finetuned model and transcribe speech in real time with live editing capabilities.The optimized Whisper-small model was evaluated against Whisper’s retrained small, medium and large(version 2) counterparts. The evaluation was performed on a clean unseen data prepared by me. The fine-tuned model achieved lower Word Error Rates (WER) of 4.53%, compared to 5.51% and 5.78% for Whisper-Medium and Whisper-Large-V2 respectively. These results demonstrate that fine-tuning a transformer-based ASR model on domain-specific data can significantly enhance its performance in a targeted context, such as live lecture transcription.The findings of this experiment highlight the promise of transformer-based models for improving educational accessibility. From thereon, building an application tailored to live lecture settings, this research contributes to the development of adaptable, low-cost technologies that support inclusive learning environments. The success of this experiment lays the groundwork for future breakthroughs in speech recognition, aiming to make education more accessible for everyone.
ISBN:	9798310380806
Font:	ProQuest Dissertations & Theses Global