Audio Transcript Segmentation Via Supervised Topic Modeling

保存先:
書誌詳細
出版年:PQDT - Global (2025)
第一著者: Serra, Francisco Maria Lopes Pinto Pimentel
出版事項:
ProQuest Dissertations & Theses
主題:
オンライン・アクセス:Citation/Abstract
Full Text - PDF
タグ: タグ追加
タグなし, このレコードへの初めてのタグを付けませんか!

MARC

LEADER 00000nab a2200000uu 4500
001 3275478367
003 UK-CbPIL
020 |a 9798265424792 
035 |a 3275478367 
045 2 |b d20250101  |b d20251231 
084 |a 189128  |2 nlm 
100 1 |a Serra, Francisco Maria Lopes Pinto Pimentel 
245 1 |a Audio Transcript Segmentation Via Supervised Topic Modeling 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a The proliferation of video content across digital platforms demands automated methods for content segmentation, particularly in long-form broadcasts where traditional visual-based approaches inadequately capture subtle topical transitions. This thesis investigates audio transcript segmentation through supervised topic modeling, comparing clustering-based and transformer-based architectures when adapted for boundary detection tasks. This research develops a comprehensive pipeline that transforms raw broadcast transcripts into topically coherent segments, introducing a novel synthetic dataset generation methodology that addresses the scarcity of ground-truth annotations. The study implements and evaluates two distinct classification paradigms: BERTopic, which combines contextual embeddings with clustering algorithms, and fine-tuned RoBERTa, leveraging deep transformer representations. A paragraph-level sliding window approach facilitates the detection of topical boundaries. Experiments conducted on a corpus derived from broadcast news transcripts reveal counterintuitive findings regarding model transferability. While transformer-based models demonstrate superior performance in document-level topic classification, clustering-based approaches exhibit enhanced sensitivity to local discourse transitions, resulting in more accurate boundary detection. This performance inversion challenges conventional assumptions about the relationship between classification accuracy and segmentation effectiveness. The developed system successfully identifies topical shifts in broadcast content, with practical implications for news media, educational platforms, and streaming services. Integration of the segmentation pipeline into existing content management systems enables enhanced searchability, automated summarization, and improved user navigation. The findings establish empirical baselines for transcript-based segmentation and provide methodological insights for developing multimodal video analysis systems that balance global topical coherence with local transition sensitivity. 
653 |a Sparsity 
653 |a Data mining 
653 |a Real time 
653 |a Information overload 
653 |a Batch processing 
653 |a Clustering 
653 |a Artificial intelligence 
653 |a Voice recognition 
653 |a Neural networks 
653 |a Natural language processing 
653 |a Multilingualism 
653 |a Linguistics 
653 |a Streaming media 
653 |a Speech 
653 |a Semantics 
653 |a Bilingual education 
653 |a Film studies 
653 |a Engineering 
773 0 |t PQDT - Global  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3275478367/abstract/embedded/IZYTEZ3DIR4FRXA2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3275478367/fulltextPDF/embedded/IZYTEZ3DIR4FRXA2?source=fedsrch