Sequential Machine Learning For Textual and Time-Series Data

Uloženo v:
Podrobná bibliografie
Vydáno v:PQDT - Global (2025)
Hlavní autor: Katsarou, Aikaterini
Vydáno:
ProQuest Dissertations & Theses
Témata:
On-line přístup:Citation/Abstract
Full Text - PDF
Full text outside of ProQuest
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Abstrakt:In today’s data-driven world, a significant challenge is the extraction of meaningful insights from large volumes of unstructured data. Central to this challenge is handling data with an inherent sequential nature, encompassing types such as textual data, time series, and event sequences.The power of sequential machine learning techniques has been exhibited through their ability to capture dependencies and patterns, especially within temporal and textual realms. However, the shortage of labeled data, class imbalances, and the intricacies of high-dimensional data present open challenges.This thesis delves deep into these challenges, primarily focusing on Natural Language Processing, using Sentiment Classification as a case study. While sentiments towards various topics and products flourish, this data's unstructured and unlabeled nature means much of its value goes untapped. Manual labeling of this massive dataset is untenable due to its scale and complexity. Addressing this, our work conceptualizes and deploys Cross-Domain and Multi-Domain Sentiment Classification models. Transfer Learning, specifically unsupervised and supervised language model pre-training, and Active Learning show promising results in tackling the data labeling problem. The thesis proposes models like CRD-SentEnse, MUTUAL, and REFORMIST that achieve robust results with minimal labeled data. This work incorporates hate speech detection to provide more comprehensive sentiment analysis, treating it as a specialized subset of negative sentiment. Such integration ensures that platforms can promptly identify and act upon harmful content. Therefore, a framework for hate speech detection is proposed to investigate the efficiency of pre-trained models and deal with the challenges of extremely negative and polarized sentiments and imbalanced classes.Transitioning from textual to time series data, we confront challenges, such as high dimensionality and temporal patterns like seasonality and trends. Given its fewer inherent assumptions, sequential machine learning is often considered more adapt-able than traditional machine learning and statistical models. Contrasting sequential machine learning with traditional methods, we present a comprehensive comparative study, highlighting the strengths and limitations of each. Applications like thermal comfort prediction and network traffic forecasting are our experimental foundation.Lastly, we handle event sequence data with our proposed model, WhatsNextApp, addressing issues like data scarcity and the user cold-start problem, outperforming state-of-the-art models.In conclusion, the methodologies presented in this thesis enhance our understanding and performance of sequential machine learning in textual, time series, and event log data, setting the stage for future research.
ISBN:9798290615073
Zdroj:ProQuest Dissertations & Theses Global