Towards better Hebrew clickbait detection: Insights from BERT and data augmentation

Guardado en:
Detalles Bibliográficos
Publicado en:PLoS One vol. 20, no. 11 (Nov 2025), p. e0332342
Autor principal: Natanya, Talya
Otros Autores: Liebeskind, Chaya
Publicado:
Public Library of Science
Materias:
Acceso en línea:Citation/Abstract
Full Text
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3269575163
003 UK-CbPIL
022 |a 1932-6203 
024 7 |a 10.1371/journal.pone.0332342  |2 doi 
035 |a 3269575163 
045 2 |b d20251101  |b d20251130 
084 |a 174835  |2 nlm 
100 1 |a Natanya, Talya 
245 1 |a Towards better Hebrew clickbait detection: Insights from BERT and data augmentation 
260 |b Public Library of Science  |c Nov 2025 
513 |a Journal Article 
520 3 |a Clickbait headlines, designed to entice readers with sensationalized or misleading content, pose significant challenges in the digital landscape. They exploit curiosity to generate traffic and revenue, often at the cost of spreading misinformation and undermining the credibility of online content. Identifying clickbait is essential for improving the quality of information consumed, fostering trust in digital media, and enabling users to make informed decisions. This study advances Hebrew clickbait detection through deep learning approaches and comprehensive data augmentation strategies, targeting the unique challenges of processing a low-resource language. Building on prior research that achieved an accuracy of 87% using traditional machine learning methods, this work explores the potential of BERT-based models and diverse augmentation techniques to further enhance performance. Our experiments incorporated a variety of augmentation methods, including weak supervision, substitution-based methods, generative techniques and language-based methods, applied to state-of-the-art Hebrew language models. The results highlight that targeted augmentation strategies, particularly those focusing on word-level replacements and contextual enhancements, consistently improved model performance. Our top-performing configuration achieved an accuracy of 92%, surpassing traditional machine learning benchmarks. These study results can be applied in real-world systems to automatically detect and reduce clickbait in Hebrew digital media, supporting news websites and social platforms in improving content quality and user trust. Furthermore, it provides a replicable framework for tackling similar challenges in other underrepresented languages, highlighting the transformative potential of combining advanced deep learning methods with tailored data augmentation strategies. 
653 |a Language 
653 |a Accuracy 
653 |a Machine learning 
653 |a Text categorization 
653 |a Data augmentation 
653 |a Benchmarks 
653 |a Deep learning 
653 |a Datasets 
653 |a Configuration management 
653 |a Classification 
653 |a Multilingualism 
653 |a Natural language processing 
653 |a Large language models 
653 |a Learning algorithms 
653 |a Digital media 
653 |a Semantics 
653 |a False information 
653 |a Readers 
653 |a Economic 
700 1 |a Liebeskind, Chaya 
773 0 |t PLoS One  |g vol. 20, no. 11 (Nov 2025), p. e0332342 
786 0 |d ProQuest  |t Health & Medical Collection 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3269575163/abstract/embedded/Q8Z64E4HU3OH5N8U?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3269575163/fulltext/embedded/Q8Z64E4HU3OH5N8U?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3269575163/fulltextPDF/embedded/Q8Z64E4HU3OH5N8U?source=fedsrch