Phoneme-Aware Augmentation for Robust Cantonese ASR Under Low-Resource Conditions
Guardado en:
| Publicado en: | Symmetry vol. 17, no. 9 (2025), p. 1478-1496 |
|---|---|
| Autor principal: | |
| Otros Autores: | , |
| Publicado: |
MDPI AG
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text + Graphics Full Text - PDF |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3254649183 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2073-8994 | ||
| 024 | 7 | |a 10.3390/sym17091478 |2 doi | |
| 035 | |a 3254649183 | ||
| 045 | 2 | |b d20250101 |b d20251231 | |
| 084 | |a 231635 |2 nlm | ||
| 100 | 1 | |a Zhang Lusheng |u School of Physics and Electronic Information, Yantai University, Yantai 264005, China; ytdxeduzls@s.ytu.edu.cn (L.Z.); wushie@ytu.edu.cn (S.W.) | |
| 245 | 1 | |a Phoneme-Aware Augmentation for Robust Cantonese ASR Under Low-Resource Conditions | |
| 260 | |b MDPI AG |c 2025 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a Cantonese automatic speech recognition (ASR) faces persistent challenges due to its nine lexical tones, extensive phonological variation, and the scarcity of professionally transcribed corpora. To address these issues, we propose a lightweight and data-efficient framework that leverages weak phonetic supervision (WPS) in conjunction with two pho-neme-aware augmentation strategies. (1) Dynamic Boundary-Aligned Phoneme Dropout progressively removes entire IPA segments according to a curriculum schedule, simulating real-world phenomena such as elision, lenition, and tonal drift while ensuring training stability. (2) Phoneme-Aware SpecAugment confines all time- and frequency-masking operations within phoneme boundaries and prioritizes high-attention regions, thereby preserving intra-phonemic contours and formant integrity. Built on the Whistle encoder—which integrates a Conformer backbone, Connectionist Temporal Classification–Conditional Random Field (CTC-CRF) alignment, and a multi-lingual phonetic space—the approach requires only a grapheme-to-phoneme lexicon and Montreal Forced Aligner outputs, without any additional manual labeling. Experiments on the Cantonese subset of Common Voice demonstrate consistent gains: Dynamic Dropout alone reduces phoneme error rate (PER) from 17.8% to 16.7% with 50 h of speech and 16.4% to 15.1% with 100 h, while the combination of the two augmentations further lowers PER to 15.9%/14.4%. These results confirm that structure-aware phoneme-level perturbations provide an effective and low-cost solution for building robust Cantonese ASR systems under low-resource conditions. | |
| 653 | |a Error analysis | ||
| 653 | |a Accuracy | ||
| 653 | |a Phonology | ||
| 653 | |a Conditional random fields | ||
| 653 | |a Cantonese | ||
| 653 | |a Phonetics | ||
| 653 | |a Phonemes | ||
| 653 | |a Masking | ||
| 653 | |a Phonemics | ||
| 653 | |a Voice recognition | ||
| 653 | |a Supervision | ||
| 653 | |a Tone | ||
| 653 | |a Speech recognition | ||
| 653 | |a Multilingualism | ||
| 653 | |a Robustness (mathematics) | ||
| 653 | |a Annotations | ||
| 653 | |a Acoustics | ||
| 653 | |a Automatic speech recognition | ||
| 653 | |a Speech | ||
| 653 | |a Grapheme phoneme correspondence | ||
| 653 | |a Chinese languages | ||
| 653 | |a Reduction (Phonological or Phonetic) | ||
| 653 | |a Semantics | ||
| 653 | |a Cultural heritage | ||
| 653 | |a Experiments | ||
| 653 | |a Dropping out | ||
| 653 | |a Classification | ||
| 653 | |a Scarcity | ||
| 653 | |a Contours | ||
| 653 | |a Augmentation | ||
| 653 | |a Morality | ||
| 653 | |a Curricula | ||
| 700 | 1 | |a Wu, Shie |u School of Physics and Electronic Information, Yantai University, Yantai 264005, China; ytdxeduzls@s.ytu.edu.cn (L.Z.); wushie@ytu.edu.cn (S.W.) | |
| 700 | 1 | |a Wang Zhongxun |u School of Physics and Electronic Information, Yantai University, Yantai 264005, China; ytdxeduzls@s.ytu.edu.cn (L.Z.); wushie@ytu.edu.cn (S.W.) | |
| 773 | 0 | |t Symmetry |g vol. 17, no. 9 (2025), p. 1478-1496 | |
| 786 | 0 | |d ProQuest |t Engineering Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3254649183/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text + Graphics |u https://www.proquest.com/docview/3254649183/fulltextwithgraphics/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3254649183/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |