Towards better Hebrew clickbait detection: Insights from BERT and data augmentation
Guardado en:
| Publicado en: | PLoS One vol. 20, no. 11 (Nov 2025), p. e0332342 |
|---|---|
| Autor principal: | |
| Otros Autores: | |
| Publicado: |
Public Library of Science
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text Full Text - PDF |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3269575163 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 1932-6203 | ||
| 024 | 7 | |a 10.1371/journal.pone.0332342 |2 doi | |
| 035 | |a 3269575163 | ||
| 045 | 2 | |b d20251101 |b d20251130 | |
| 084 | |a 174835 |2 nlm | ||
| 100 | 1 | |a Natanya, Talya | |
| 245 | 1 | |a Towards better Hebrew clickbait detection: Insights from BERT and data augmentation | |
| 260 | |b Public Library of Science |c Nov 2025 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a Clickbait headlines, designed to entice readers with sensationalized or misleading content, pose significant challenges in the digital landscape. They exploit curiosity to generate traffic and revenue, often at the cost of spreading misinformation and undermining the credibility of online content. Identifying clickbait is essential for improving the quality of information consumed, fostering trust in digital media, and enabling users to make informed decisions. This study advances Hebrew clickbait detection through deep learning approaches and comprehensive data augmentation strategies, targeting the unique challenges of processing a low-resource language. Building on prior research that achieved an accuracy of 87% using traditional machine learning methods, this work explores the potential of BERT-based models and diverse augmentation techniques to further enhance performance. Our experiments incorporated a variety of augmentation methods, including weak supervision, substitution-based methods, generative techniques and language-based methods, applied to state-of-the-art Hebrew language models. The results highlight that targeted augmentation strategies, particularly those focusing on word-level replacements and contextual enhancements, consistently improved model performance. Our top-performing configuration achieved an accuracy of 92%, surpassing traditional machine learning benchmarks. These study results can be applied in real-world systems to automatically detect and reduce clickbait in Hebrew digital media, supporting news websites and social platforms in improving content quality and user trust. Furthermore, it provides a replicable framework for tackling similar challenges in other underrepresented languages, highlighting the transformative potential of combining advanced deep learning methods with tailored data augmentation strategies. | |
| 653 | |a Language | ||
| 653 | |a Accuracy | ||
| 653 | |a Machine learning | ||
| 653 | |a Text categorization | ||
| 653 | |a Data augmentation | ||
| 653 | |a Benchmarks | ||
| 653 | |a Deep learning | ||
| 653 | |a Datasets | ||
| 653 | |a Configuration management | ||
| 653 | |a Classification | ||
| 653 | |a Multilingualism | ||
| 653 | |a Natural language processing | ||
| 653 | |a Large language models | ||
| 653 | |a Learning algorithms | ||
| 653 | |a Digital media | ||
| 653 | |a Semantics | ||
| 653 | |a False information | ||
| 653 | |a Readers | ||
| 653 | |a Economic | ||
| 700 | 1 | |a Liebeskind, Chaya | |
| 773 | 0 | |t PLoS One |g vol. 20, no. 11 (Nov 2025), p. e0332342 | |
| 786 | 0 | |d ProQuest |t Health & Medical Collection | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3269575163/abstract/embedded/Q8Z64E4HU3OH5N8U?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text |u https://www.proquest.com/docview/3269575163/fulltext/embedded/Q8Z64E4HU3OH5N8U?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3269575163/fulltextPDF/embedded/Q8Z64E4HU3OH5N8U?source=fedsrch |