Multilingual Sentiment Analysis with Data Augmentation: A Cross-Language Evaluation in French, German, and Japanese

محفوظ في:
التفاصيل البيبلوغرافية
الحاوية / القاعدة:Information vol. 16, no. 9 (2025), p. 806-830
المؤلف الرئيسي: Suboh, Alkhushayni
مؤلفون آخرون: Lee, Hyesu
منشور في:
MDPI AG
الموضوعات:
الوصول للمادة أونلاين:Citation/Abstract
Full Text + Graphics
Full Text - PDF
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

MARC

LEADER 00000nab a2200000uu 4500
001 3254540394
003 UK-CbPIL
022 |a 2078-2489 
024 7 |a 10.3390/info16090806  |2 doi 
035 |a 3254540394 
045 2 |b d20250101  |b d20251231 
084 |a 231474  |2 nlm 
100 1 |a Suboh, Alkhushayni  |u Department of Information Systems, Faculty of Information Technology and Computer Science, Yarmouk University, Irbid 21163, Jordan 
245 1 |a Multilingual Sentiment Analysis with Data Augmentation: A Cross-Language Evaluation in French, German, and Japanese 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a Machine learning in natural language processing (NLP) analyzes datasets to make future predictions, but developing accurate models requires large, high-quality, and balanced datasets. However, collecting such datasets, especially for low-resource languages, is time-consuming and costly. As a solution, data augmentation can be used to increase the dataset size by generating synthetic samples from existing data. This study examines the effect of translation-based data augmentation on sentiment analysis using small datasets in three diverse languages: French, German, and Japanese. We use two neural machine translation (NMT) services—Google Translate and DeepL—to generate augmented datasets through intermediate language translation. Sentiment analysis models based on Support Vector Machine (SVM) are trained on both original and augmented datasets and evaluated using accuracy, precision, recall, and F1 score. Our results demonstrate that translation augmentation significantly enhances model performance in both French and Japanese. For example, using Google Translate, model accuracy improved from 62.50% to 83.55% in Japanese (+21.05%) and from 87.66% to 90.26% in French (+2.6%). In contrast, the German dataset showed a minor improvement or decline, depending on the translator used. Google-based augmentation generally outperformed DeepL, which yielded smaller or negative gains. To evaluate cross-lingual generalization, models trained on one language were tested on datasets in the other two. Notably, a model trained on augmented German data improved its accuracy on French test data from 81.17% to 85.71% and on Japanese test data from 71.71% to 79.61%. Similarly, a model trained on augmented Japanese data improved accuracy on German test data by up to 3.4%. These findings highlight that translation-based augmentation can enhance sentiment classification and cross-language adaptability, particularly in low-resource and multilingual NLP settings. 
653 |a Language 
653 |a French language 
653 |a Accuracy 
653 |a Datasets 
653 |a Machine learning 
653 |a Machine translation 
653 |a Market research 
653 |a Language translation 
653 |a Intermediate languages 
653 |a Data augmentation 
653 |a Japanese language 
653 |a Sentiment analysis 
653 |a Support vector machines 
653 |a Hypotheses 
653 |a Natural language processing 
653 |a Classification 
653 |a German language 
653 |a Multilingualism 
653 |a Linguistics 
653 |a Algorithms 
653 |a Translators 
653 |a Product development 
653 |a Prediction models 
653 |a Augmentation 
653 |a Data 
653 |a Languages 
653 |a Language acquisition 
653 |a Tests 
653 |a Translation 
700 1 |a Lee, Hyesu  |u Department of Computer Information Science, Minnesota State University, Mankato, MN 56001, USA; hyesu.lee@mnsu.edu 
773 0 |t Information  |g vol. 16, no. 9 (2025), p. 806-830 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3254540394/abstract/embedded/J7RWLIQ9I3C9JK51?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3254540394/fulltextwithgraphics/embedded/J7RWLIQ9I3C9JK51?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3254540394/fulltextPDF/embedded/J7RWLIQ9I3C9JK51?source=fedsrch