Optimising Contextual Embeddings for Meaning Conflation Deficiency Resolution in Low-Resourced Languages

Salvato in:
Dettagli Bibliografici
Pubblicato in:Computers vol. 14, no. 9 (2025), p. 402-431
Autore principale: Masethe, Mosima A
Altri autori: Ojo, Sunday O, Masethe, Hlaudi D
Pubblicazione:
MDPI AG
Soggetti:
Accesso online:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!

MARC

LEADER 00000nab a2200000uu 4500
001 3254483396
003 UK-CbPIL
022 |a 2073-431X 
024 7 |a 10.3390/computers14090402  |2 doi 
035 |a 3254483396 
045 2 |b d20250101  |b d20251231 
084 |a 231447  |2 nlm 
100 1 |a Masethe, Mosima A  |u Department of Information Technology, Faculty of Accounting and Informatics, Durban University of Technology, Durban 4001, South Africa 
245 1 |a Optimising Contextual Embeddings for Meaning Conflation Deficiency Resolution in Low-Resourced Languages 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a Meaning conflation deficiency (MCD) presents a continual obstacle in natural language processing (NLP), especially for low-resourced and morphologically complex languages, where polysemy and contextual ambiguity diminish model precision in word sense disambiguation (WSD) tasks. This paper examines the optimisation of contextual embedding models, namely XLNet, ELMo, BART, and their improved variations, to tackle MCD in linguistic settings. Utilising Sesotho sa Leboa as a case study, researchers devised an enhanced XLNet architecture with specific hyperparameter optimisation, dynamic padding, early termination, and class-balanced training. Comparative assessments reveal that the optimised XLNet attains an accuracy of 91% and exhibits balanced precision–recall metrics of 92% and 91%, respectively, surpassing both its baseline counterpart and competing models. Optimised ELMo attained the greatest overall metrics (accuracy: 92%, F1-score: 96%), whilst optimised BART demonstrated significant accuracy improvements (96%) despite a reduced recall. The results demonstrate that fine-tuning contextual embeddings using MCD-specific methodologies significantly improves semantic disambiguation for under-represented languages. This study offers a scalable and flexible optimisation approach suitable for additional low-resource language contexts. 
653 |a Sparsity 
653 |a Language 
653 |a Recall 
653 |a Accuracy 
653 |a Semantics 
653 |a Word sense disambiguation 
653 |a Case studies 
653 |a Natural language processing 
653 |a Optimization 
653 |a Adaptation 
653 |a Morphological complexity 
653 |a Annotations 
653 |a Polysemy 
653 |a Morphology 
653 |a Sotho languages 
653 |a Meaning 
653 |a Termination 
653 |a Ambiguity 
653 |a Languages 
700 1 |a Ojo, Sunday O  |u Department of Information Technology, Faculty of Accounting and Informatics, Durban University of Technology, Durban 4001, South Africa 
700 1 |a Masethe, Hlaudi D  |u Department of Data Science, Faculty of Information Communication Technology, Tshwane University of Technology, Pretoria 0001, South Africa 
773 0 |t Computers  |g vol. 14, no. 9 (2025), p. 402-431 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3254483396/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3254483396/fulltextwithgraphics/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3254483396/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch