Phoneme-Aware Augmentation for Robust Cantonese ASR Under Low-Resource Conditions
Enregistré dans:
| Publié dans: | Symmetry vol. 17, no. 9 (2025), p. 1478-1496 |
|---|---|
| Auteur principal: | |
| Autres auteurs: | , |
| Publié: |
MDPI AG
|
| Sujets: | |
| Accès en ligne: | Citation/Abstract Full Text + Graphics Full Text - PDF |
| Tags: |
Pas de tags, Soyez le premier à ajouter un tag!
|
| Résumé: | Cantonese automatic speech recognition (ASR) faces persistent challenges due to its nine lexical tones, extensive phonological variation, and the scarcity of professionally transcribed corpora. To address these issues, we propose a lightweight and data-efficient framework that leverages weak phonetic supervision (WPS) in conjunction with two pho-neme-aware augmentation strategies. (1) Dynamic Boundary-Aligned Phoneme Dropout progressively removes entire IPA segments according to a curriculum schedule, simulating real-world phenomena such as elision, lenition, and tonal drift while ensuring training stability. (2) Phoneme-Aware SpecAugment confines all time- and frequency-masking operations within phoneme boundaries and prioritizes high-attention regions, thereby preserving intra-phonemic contours and formant integrity. Built on the Whistle encoder—which integrates a Conformer backbone, Connectionist Temporal Classification–Conditional Random Field (CTC-CRF) alignment, and a multi-lingual phonetic space—the approach requires only a grapheme-to-phoneme lexicon and Montreal Forced Aligner outputs, without any additional manual labeling. Experiments on the Cantonese subset of Common Voice demonstrate consistent gains: Dynamic Dropout alone reduces phoneme error rate (PER) from 17.8% to 16.7% with 50 h of speech and 16.4% to 15.1% with 100 h, while the combination of the two augmentations further lowers PER to 15.9%/14.4%. These results confirm that structure-aware phoneme-level perturbations provide an effective and low-cost solution for building robust Cantonese ASR systems under low-resource conditions. |
|---|---|
| ISSN: | 2073-8994 |
| DOI: | 10.3390/sym17091478 |
| Source: | Engineering Database |