In-Context Learning for Low-Resource Machine Translation: A Study on Tarifit with Large Language Models

Guardado en:
Bibliografiske detaljer
Udgivet i:Algorithms vol. 18, no. 8 (2025), p. 489-508
Hovedforfatter: Akallouch Oussama
Andre forfattere: Fardousse Khalid
Udgivet:
MDPI AG
Fag:
Online adgang:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Tags: Tilføj Tag
Ingen Tags, Vær først til at tagge denne postø!

MARC

LEADER 00000nab a2200000uu 4500
001 3243965753
003 UK-CbPIL
022 |a 1999-4893 
024 7 |a 10.3390/a18080489  |2 doi 
035 |a 3243965753 
045 2 |b d20250101  |b d20251231 
084 |a 231333  |2 nlm 
100 1 |a Akallouch Oussama 
245 1 |a In-Context Learning for Low-Resource Machine Translation: A Study on Tarifit with Large Language Models 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a This study presents the first systematic evaluation of in-context learning for Tarifit machine translation, a low-resource Amazigh language spoken by 5 million people in Morocco and Europe. We assess three large language models (GPT-4, Claude-3.5, PaLM-2) across Tarifit–Arabic, Tarifit–French, and Tarifit–English translation using 1000 sentence pairs and 5-fold cross-validation. Results show that 8-shot similarity-based demonstration selection achieves optimal performance. GPT-4 achieved 20.2 BLEU for Tarifit–Arabic, 14.8 for Tarifit–French, and 10.9 for Tarifit–English. Linguistic proximity significantly impacts translation quality, with Tarifit–Arabic substantially outperforming other language pairs by 8.4 BLEU points due to shared vocabulary and morphological patterns. Error analysis reveals systematic issues with morphological complexity (42% of errors) and cultural terminology preservation (18% of errors). This work establishes baseline benchmarks for Tarifit translation and demonstrates the viability of in-context learning for morphologically complex low-resource languages, contributing to linguistic equity in AI systems. 
653 |a Error analysis 
653 |a Language 
653 |a French language 
653 |a Dictionaries 
653 |a English language 
653 |a Datasets 
653 |a Errors 
653 |a Morphology 
653 |a Machine translation 
653 |a Terminology 
653 |a Endangered languages 
653 |a Morphological analysis 
653 |a Vocabulary 
653 |a Business metrics 
653 |a Grammar 
653 |a Linguistics 
653 |a Learning 
653 |a Large language models 
653 |a Arabic language 
653 |a Voice recognition 
653 |a Proximity 
653 |a Languages 
653 |a Preservation 
653 |a Berber languages 
653 |a Natural language processing 
653 |a Multilingualism 
653 |a Complexity 
653 |a Context 
653 |a Morphological complexity 
653 |a Language modeling 
653 |a Translation 
653 |a Cultural heritage 
700 1 |a Fardousse Khalid 
773 0 |t Algorithms  |g vol. 18, no. 8 (2025), p. 489-508 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3243965753/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3243965753/fulltextwithgraphics/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3243965753/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch