In-Context Learning for Low-Resource Machine Translation: A Study on Tarifit with Large Language Models

Guardado en:

Bibliografiske detaljer
Udgivet i:	Algorithms vol. 18, no. 8 (2025), p. 489-508
Hovedforfatter:	Akallouch Oussama
Andre forfattere:	Fardousse Khalid
Udgivet:	MDPI AG
Fag:	Error analysis Language French language Dictionaries English language Datasets Errors Morphology Machine translation Terminology Endangered languages Morphological analysis Vocabulary Business metrics Grammar Linguistics Learning Large language models Arabic language Voice recognition Proximity Languages Preservation Berber languages Natural language processing Multilingualism Complexity Context Morphological complexity Language modeling Translation Cultural heritage
Online adgang:	Citation/Abstract Full Text + Graphics Full Text - PDF
Tags:	Tilføj Tag Ingen Tags, Vær først til at tagge denne postø!

MARC


LEADER	00000nab a2200000uu 4500
001	3243965753
003	UK-CbPIL
022			\|a 1999-4893
024	7		\|a 10.3390/a18080489 \|2 doi
035			\|a 3243965753
045	2		\|b d20250101 \|b d20251231
084			\|a 231333 \|2 nlm
100	1		\|a Akallouch Oussama
245	1		\|a In-Context Learning for Low-Resource Machine Translation: A Study on Tarifit with Large Language Models
260			\|b MDPI AG \|c 2025
513			\|a Journal Article
520	3		\|a This study presents the first systematic evaluation of in-context learning for Tarifit machine translation, a low-resource Amazigh language spoken by 5 million people in Morocco and Europe. We assess three large language models (GPT-4, Claude-3.5, PaLM-2) across Tarifit–Arabic, Tarifit–French, and Tarifit–English translation using 1000 sentence pairs and 5-fold cross-validation. Results show that 8-shot similarity-based demonstration selection achieves optimal performance. GPT-4 achieved 20.2 BLEU for Tarifit–Arabic, 14.8 for Tarifit–French, and 10.9 for Tarifit–English. Linguistic proximity significantly impacts translation quality, with Tarifit–Arabic substantially outperforming other language pairs by 8.4 BLEU points due to shared vocabulary and morphological patterns. Error analysis reveals systematic issues with morphological complexity (42% of errors) and cultural terminology preservation (18% of errors). This work establishes baseline benchmarks for Tarifit translation and demonstrates the viability of in-context learning for morphologically complex low-resource languages, contributing to linguistic equity in AI systems.
653			\|a Error analysis
653			\|a Language
653			\|a French language
653			\|a Dictionaries
653			\|a English language
653			\|a Datasets
653			\|a Errors
653			\|a Morphology
653			\|a Machine translation
653			\|a Terminology
653			\|a Endangered languages
653			\|a Morphological analysis
653			\|a Vocabulary
653			\|a Business metrics
653			\|a Grammar
653			\|a Linguistics
653			\|a Learning
653			\|a Large language models
653			\|a Arabic language
653			\|a Voice recognition
653			\|a Proximity
653			\|a Languages
653			\|a Preservation
653			\|a Berber languages
653			\|a Natural language processing
653			\|a Multilingualism
653			\|a Complexity
653			\|a Context
653			\|a Morphological complexity
653			\|a Language modeling
653			\|a Translation
653			\|a Cultural heritage
700	1		\|a Fardousse Khalid
773	0		\|t Algorithms \|g vol. 18, no. 8 (2025), p. 489-508
786	0		\|d ProQuest \|t Engineering Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3243965753/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch
856	4	0	\|3 Full Text + Graphics \|u https://www.proquest.com/docview/3243965753/fulltextwithgraphics/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/3243965753/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch