Syntactic complexity recognition and analysis in Chinese-English machine translation: A comparative study based on the BLSTM-CRF model

Сохранить в:
Библиографические подробности
Опубликовано в::PLoS One vol. 20, no. 6 (Jun 2025), p. e0325721
Главный автор: Tian, Yongli
Опубликовано:
Public Library of Science
Предметы:
Online-ссылка:Citation/Abstract
Full Text
Full Text - PDF
Метки: Добавить метку
Нет меток, Требуется 1-ая метка записи!

MARC

LEADER 00000nab a2200000uu 4500
001 3218425698
003 UK-CbPIL
022 |a 1932-6203 
024 7 |a 10.1371/journal.pone.0325721  |2 doi 
035 |a 3218425698 
045 2 |b d20250601  |b d20250630 
084 |a 174835  |2 nlm 
100 1 |a Tian, Yongli 
245 1 |a Syntactic complexity recognition and analysis in Chinese-English machine translation: A comparative study based on the BLSTM-CRF model 
260 |b Public Library of Science  |c Jun 2025 
513 |a Journal Article 
520 3 |a To enhance the recognition and preservation of syntactic complexity in Chinese–English translation, this study proposes an optimized Bidirectional Long Short-Term Memory–Conditional Random Field (BiLSTM-CRF) model. Based on the Workshop on Machine Translation (WMT) Chinese-English parallel corpus, an experimental framework is designed for two types of specialized data: complex sentences and cross-linguistic sentence pairs. The model integrates explicit syntactic features, including part-of-speech tags, dependency relations, and syntactic tree depth, and incorporates an attention mechanism to improve the model’s ability to capture syntactic complexity. In addition, this study constructs an evaluation framework consisting of eight indicators to assess syntactic complexity recognition and translation quality. These indicators encompass: (1) Average syntactic node depth (higher values indicate greater complexity; typically ranging from 1.0 to 5.0); (2) The number of embedded clause levels (higher values illustrate greater complexity; typically 0–5); (3) Long-distance dependency ratio (higher values indicate broader dependency spans; range 0–1, moderate values preferred); (4) Average branching factor (higher values show denser modifiers; range 1.0–4.0); (5) Syntactic change ratio (lower values demonstrate structural stability; range 0–1); (6) Translation alignment consistency rate (higher values indicate better alignment; range 0–1); (7) Syntactic tree reconstruction cost (lower values refer to smaller structural adjustment overhead; range 0–1); (8) Translation syntactic balance (higher values illustrate more natural syntactic rendering; range 0–1). This indicator system enables comprehensive evaluation of the model’s capabilities in syntactic modeling, structural preservation, and cross-linguistic alignment. Experimental results show that the optimized model outperforms baseline models across multiple core indicators. On the complex sentence dataset, the optimized model achieves a long-distance dependency ratio of 0.658 (moderately high), an embedded clause level of 3.167 (indicating complex structure), and an average branching factor of 2.897. The syntactic change ratio is only 0.432, all of which significantly outperform comparative models such as Syntax-Transformer and Syntax-Bidirectional Encoder Representations from Transformers (Syntax-BERT). On the cross-linguistic sentence dataset, the optimized model attains a syntactic tree reconstruction cost of only 0.214 (low adjustment overhead) and a translation alignment consistency rate of 0.894 (high alignment accuracy). This demonstrates remarkable advantages in structural preservation and adjustment. In contrast, comparison models show unstable performance on complex and cross-linguistic data. For example, Syntax-BERT achieves only 2.321 for the embedded clause level, indicating difficulty in handling complex syntactic structures. In summary, by introducing explicit syntactic features and a multidimensional indicator system, this study demonstrates strong modeling capacity in syntactic complexity recognition and achieves better preservation of syntactic structures during translation. This study offers new insights into syntactic complexity modeling in natural language processing and provides valuable theoretical and practical contributions to syntactic processing in machine translation systems. 
653 |a Language 
653 |a Accuracy 
653 |a Models 
653 |a Syntax 
653 |a Conditional random fields 
653 |a Modelling 
653 |a Optimization 
653 |a Labeling 
653 |a Indicators 
653 |a Trees 
653 |a Machine translation 
653 |a Branching 
653 |a Long short-term memory 
653 |a Reconstruction 
653 |a Preservation 
653 |a Linguistics 
653 |a Comparative studies 
653 |a Datasets 
653 |a Alignment 
653 |a Translation 
653 |a Recognition 
653 |a Information processing 
653 |a Complexity 
653 |a Natural language processing 
653 |a Large language models 
653 |a English language 
653 |a Structural stability 
653 |a Semantics 
653 |a Sentences 
653 |a Economic 
773 0 |t PLoS One  |g vol. 20, no. 6 (Jun 2025), p. e0325721 
786 0 |d ProQuest  |t Health & Medical Collection 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3218425698/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3218425698/fulltext/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3218425698/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch