Low-Resourced Alphabet-Level Pivot-Based Neural Machine Translation for Translating Korean Dialects
Guardado en:
| Publicado en: | Applied Sciences vol. 15, no. 17 (2025), p. 9459-9476 |
|---|---|
| Autor principal: | |
| Otros Autores: | |
| Publicado: |
MDPI AG
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text + Graphics Full Text - PDF |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3249676089 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2076-3417 | ||
| 024 | 7 | |a 10.3390/app15179459 |2 doi | |
| 035 | |a 3249676089 | ||
| 045 | 2 | |b d20250101 |b d20251231 | |
| 084 | |a 231338 |2 nlm | ||
| 100 | 1 | |a Park, Junho | |
| 245 | 1 | |a Low-Resourced Alphabet-Level Pivot-Based Neural Machine Translation for Translating Korean Dialects | |
| 260 | |b MDPI AG |c 2025 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a Developing a machine translator from a Korean dialect to a foreign language presents significant challenges due to a lack of a parallel corpus for direct dialect translation. To solve this issue, this paper proposes a pivot-based machine translation model that consists of two sub-translators. The first sub-translator is a sequence-to-sequence model with minGRU as an encoder and GRU as a decoder. It normalizes a dialect sentence into a standard sentence, and it employs alphabet-level tokenization. The other type of sub-translator is a legacy translator, such as off-the-shelf neural machine translators or LLMs, which translates the normalized standard sentence to a foreign sentence. The effectiveness of the alphabet-level tokenization and the minGRU encoder for the normalization model is demonstrated through empirical analysis. Alphabet-level tokenization is proven to be more effective for Korean dialect normalization than other widely used sub-word tokenizations. The minGRU encoder exhibits comparable performance to GRU as an encoder, and it is faster and more effective in managing longer token sequences. The pivot-based translation method is also validated through a broad range of experiments, and its effectiveness in translating Korean dialects to English, Chinese, and Japanese is demonstrated empirically. | |
| 653 | |a Language | ||
| 653 | |a Dialects | ||
| 653 | |a Experiments | ||
| 653 | |a Parallel corpora | ||
| 653 | |a Machine translation | ||
| 653 | |a Sequences | ||
| 653 | |a Standard dialects | ||
| 653 | |a Interpreters | ||
| 653 | |a Chinese languages | ||
| 653 | |a Japanese language | ||
| 653 | |a Sentences | ||
| 653 | |a Foreign languages | ||
| 653 | |a Phonetics | ||
| 653 | |a Translation | ||
| 653 | |a Speech | ||
| 653 | |a Large language models | ||
| 653 | |a Korean language | ||
| 653 | |a Morphology | ||
| 653 | |a Alphabets | ||
| 653 | |a Normalization | ||
| 653 | |a Translation methods and strategies | ||
| 700 | 1 | |a Park, Seong-Bae | |
| 773 | 0 | |t Applied Sciences |g vol. 15, no. 17 (2025), p. 9459-9476 | |
| 786 | 0 | |d ProQuest |t Publicly Available Content Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3249676089/abstract/embedded/75I98GEZK8WCJMPQ?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text + Graphics |u https://www.proquest.com/docview/3249676089/fulltextwithgraphics/embedded/75I98GEZK8WCJMPQ?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3249676089/fulltextPDF/embedded/75I98GEZK8WCJMPQ?source=fedsrch |