COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations
Spremljeno u:
| Izdano u: | arXiv.org (Dec 22, 2024), p. n/a |
|---|---|
| Glavni autor: | |
| Daljnji autori: | |
| Izdano: |
Cornell University Library, arXiv.org
|
| Teme: | |
| Online pristup: | Citation/Abstract Full text outside of ProQuest |
| Oznake: |
Bez oznaka, Budi prvi tko označuje ovaj zapis!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3148980583 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2331-8422 | ||
| 035 | |a 3148980583 | ||
| 045 | 0 | |b d20241222 | |
| 100 | 1 | |a Su, Vanessa | |
| 245 | 1 | |a COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations | |
| 260 | |b Cornell University Library, arXiv.org |c Dec 22, 2024 | ||
| 513 | |a Working Paper | ||
| 520 | 3 | |a This study presents a data-driven analysis of COVID-19 discourse on YouTube, examining the sentiment, toxicity, and thematic patterns of video content published between January 2023 and October 2024. The analysis involved applying advanced natural language processing (NLP) techniques: sentiment analysis with VADER, toxicity detection with Detoxify, and topic modeling using Latent Dirichlet Allocation (LDA). The sentiment analysis revealed that 49.32% of video descriptions were positive, 36.63% were neutral, and 14.05% were negative, indicating a generally informative and supportive tone in pandemic-related content. Toxicity analysis identified only 0.91% of content as toxic, suggesting minimal exposure to toxic content. Topic modeling revealed two main themes, with 66.74% of the videos covering general health information and pandemic-related impacts and 33.26% focused on news and real-time updates, highlighting the dual informational role of YouTube. A recommendation system was also developed using TF-IDF vectorization and cosine similarity, refined by sentiment, toxicity, and topic filters to ensure relevant and context-aligned video recommendations. This system achieved 69% aggregate coverage, with monthly coverage rates consistently above 85%, demonstrating robust performance and adaptability over time. Evaluation across recommendation sizes showed coverage reaching 69% for five video recommendations and 79% for ten video recommendations per video. In summary, this work presents a framework for understanding COVID-19 discourse on YouTube and a recommendation system that supports user engagement while promoting responsible and relevant content related to COVID-19. | |
| 653 | |a Data analysis | ||
| 653 | |a Recommender systems | ||
| 653 | |a Toxicity | ||
| 653 | |a Sentiment analysis | ||
| 653 | |a Real time | ||
| 653 | |a Natural language processing | ||
| 653 | |a Data mining | ||
| 653 | |a Modelling | ||
| 653 | |a Pandemics | ||
| 653 | |a COVID-19 | ||
| 700 | 1 | |a Thakur, Nirmalya | |
| 773 | 0 | |t arXiv.org |g (Dec 22, 2024), p. n/a | |
| 786 | 0 | |d ProQuest |t Engineering Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3148980583/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch |
| 856 | 4 | 0 | |3 Full text outside of ProQuest |u http://arxiv.org/abs/2412.17180 |