A Radical-Based Token Representation Method for Enhancing Chinese Pre-Trained Language Models
Gespeichert in:
| Veröffentlicht in: | Electronics vol. 14, no. 5 (2025), p. 1031 |
|---|---|
| 1. Verfasser: | |
| Weitere Verfasser: | , , , , |
| Veröffentlicht: |
MDPI AG
|
| Schlagworte: | |
| Online-Zugang: | Citation/Abstract Full Text + Graphics Full Text - PDF |
| Tags: |
Keine Tags, Fügen Sie das erste Tag hinzu!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3176380797 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2079-9292 | ||
| 024 | 7 | |a 10.3390/electronics14051031 |2 doi | |
| 035 | |a 3176380797 | ||
| 045 | 2 | |b d20250101 |b d20251231 | |
| 084 | |a 231458 |2 nlm | ||
| 100 | 1 | |a Qin, Honglun |u School of Information Engineering, Henan University of Science and Technology, Luoyang 471023, China; <email>honglunqin@stu.haust.edu.cn</email> (H.Q.); <email>linwang@haust.edu.cn</email> (L.W.); <email>geym@haust.edu.cn</email> (Y.G.); <email>jlzhu@haust.edu.cn</email> (J.Z.) | |
| 245 | 1 | |a A Radical-Based Token Representation Method for Enhancing Chinese Pre-Trained Language Models | |
| 260 | |b MDPI AG |c 2025 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a In the domain of natural language processing (NLP), a primary challenge pertains to the process of Chinese tokenization, which remains challenging due to the lack of explicit word boundaries in written Chinese. The existing tokenization methods often treat each Chinese character as an indivisible unit, neglecting the finer semantic features embedded in the characters, such as radicals. To tackle this issue, we propose a novel token representation method that integrates radical-based features into the process. The proposed method extends the vocabulary to include both radicals and original character tokens, enabling a more granular understanding of Chinese text. We also conduct experiments on seven datasets covering multiple Chinese natural language processing tasks. The results show that our method significantly improves model performance on downstream tasks. Specifically, the accuracy of BERT on the BQ Croups dataset was enhanced to 86.95%, showing an improvement of 1.65% over the baseline. Additionally, the BERT-wwm performance demonstrated a 1.28% enhancement, suggesting that the incorporation of fine-grained radical features offers a more efficacious solution for Chinese tokenization and paves the way for future research in Chinese text processing. | |
| 653 | |a Word processing | ||
| 653 | |a Language | ||
| 653 | |a Datasets | ||
| 653 | |a Semantic features | ||
| 653 | |a Deep learning | ||
| 653 | |a Natural language processing | ||
| 653 | |a Word boundaries | ||
| 653 | |a Methods | ||
| 653 | |a Phonetics | ||
| 653 | |a Language modeling | ||
| 653 | |a Chinese languages | ||
| 653 | |a Morphology | ||
| 653 | |a Representations | ||
| 653 | |a Efficiency | ||
| 653 | |a Semantics | ||
| 653 | |a Experiments | ||
| 653 | |a Personality | ||
| 653 | |a Task performance | ||
| 653 | |a Vocabulary | ||
| 700 | 1 | |a Li, Meiwen |u School of Software, Henan University of Science and Technology, Luoyang 471023, China; <email>zhengruijuan@haust.edu.cn</email> | |
| 700 | 1 | |a Wang, Lin |u School of Information Engineering, Henan University of Science and Technology, Luoyang 471023, China; <email>honglunqin@stu.haust.edu.cn</email> (H.Q.); <email>linwang@haust.edu.cn</email> (L.W.); <email>geym@haust.edu.cn</email> (Y.G.); <email>jlzhu@haust.edu.cn</email> (J.Z.) | |
| 700 | 1 | |a Ge, Youming |u School of Information Engineering, Henan University of Science and Technology, Luoyang 471023, China; <email>honglunqin@stu.haust.edu.cn</email> (H.Q.); <email>linwang@haust.edu.cn</email> (L.W.); <email>geym@haust.edu.cn</email> (Y.G.); <email>jlzhu@haust.edu.cn</email> (J.Z.) | |
| 700 | 1 | |a Zhu, Junlong |u School of Information Engineering, Henan University of Science and Technology, Luoyang 471023, China; <email>honglunqin@stu.haust.edu.cn</email> (H.Q.); <email>linwang@haust.edu.cn</email> (L.W.); <email>geym@haust.edu.cn</email> (Y.G.); <email>jlzhu@haust.edu.cn</email> (J.Z.) | |
| 700 | 1 | |a Zheng, Ruijuan |u School of Software, Henan University of Science and Technology, Luoyang 471023, China; <email>zhengruijuan@haust.edu.cn</email> | |
| 773 | 0 | |t Electronics |g vol. 14, no. 5 (2025), p. 1031 | |
| 786 | 0 | |d ProQuest |t Advanced Technologies & Aerospace Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3176380797/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text + Graphics |u https://www.proquest.com/docview/3176380797/fulltextwithgraphics/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3176380797/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |