Low-Resourced Alphabet-Level Pivot-Based Neural Machine Translation for Translating Korean Dialects

Guardado en:
Detalles Bibliográficos
Publicado en:Applied Sciences vol. 15, no. 17 (2025), p. 9459-9476
Autor principal: Park, Junho
Otros Autores: Park, Seong-Bae
Publicado:
MDPI AG
Materias:
Acceso en línea:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3249676089
003 UK-CbPIL
022 |a 2076-3417 
024 7 |a 10.3390/app15179459  |2 doi 
035 |a 3249676089 
045 2 |b d20250101  |b d20251231 
084 |a 231338  |2 nlm 
100 1 |a Park, Junho 
245 1 |a Low-Resourced Alphabet-Level Pivot-Based Neural Machine Translation for Translating Korean Dialects 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a Developing a machine translator from a Korean dialect to a foreign language presents significant challenges due to a lack of a parallel corpus for direct dialect translation. To solve this issue, this paper proposes a pivot-based machine translation model that consists of two sub-translators. The first sub-translator is a sequence-to-sequence model with minGRU as an encoder and GRU as a decoder. It normalizes a dialect sentence into a standard sentence, and it employs alphabet-level tokenization. The other type of sub-translator is a legacy translator, such as off-the-shelf neural machine translators or LLMs, which translates the normalized standard sentence to a foreign sentence. The effectiveness of the alphabet-level tokenization and the minGRU encoder for the normalization model is demonstrated through empirical analysis. Alphabet-level tokenization is proven to be more effective for Korean dialect normalization than other widely used sub-word tokenizations. The minGRU encoder exhibits comparable performance to GRU as an encoder, and it is faster and more effective in managing longer token sequences. The pivot-based translation method is also validated through a broad range of experiments, and its effectiveness in translating Korean dialects to English, Chinese, and Japanese is demonstrated empirically. 
653 |a Language 
653 |a Dialects 
653 |a Experiments 
653 |a Parallel corpora 
653 |a Machine translation 
653 |a Sequences 
653 |a Standard dialects 
653 |a Interpreters 
653 |a Chinese languages 
653 |a Japanese language 
653 |a Sentences 
653 |a Foreign languages 
653 |a Phonetics 
653 |a Translation 
653 |a Speech 
653 |a Large language models 
653 |a Korean language 
653 |a Morphology 
653 |a Alphabets 
653 |a Normalization 
653 |a Translation methods and strategies 
700 1 |a Park, Seong-Bae 
773 0 |t Applied Sciences  |g vol. 15, no. 17 (2025), p. 9459-9476 
786 0 |d ProQuest  |t Publicly Available Content Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3249676089/abstract/embedded/75I98GEZK8WCJMPQ?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3249676089/fulltextwithgraphics/embedded/75I98GEZK8WCJMPQ?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3249676089/fulltextPDF/embedded/75I98GEZK8WCJMPQ?source=fedsrch