Speech Emotion Recognition on MELD and RAVDESS Datasets Using CNN
Сохранить в:
| Опубликовано в:: | Information vol. 16, no. 7 (2025), p. 518-536 |
|---|---|
| Главный автор: | |
| Другие авторы: | |
| Опубликовано: |
MDPI AG
|
| Предметы: | |
| Online-ссылка: | Citation/Abstract Full Text + Graphics Full Text - PDF |
| Метки: |
Нет меток, Требуется 1-ая метка записи!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3233222654 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2078-2489 | ||
| 024 | 7 | |a 10.3390/info16070518 |2 doi | |
| 035 | |a 3233222654 | ||
| 045 | 2 | |b d20250701 |b d20250731 | |
| 084 | |a 231474 |2 nlm | ||
| 100 | 1 | |a Waleed, Gheed T | |
| 245 | 1 | |a Speech Emotion Recognition on MELD and RAVDESS Datasets Using CNN | |
| 260 | |b MDPI AG |c 2025 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a Speech emotion recognition (SER) plays a vital role in enhancing human–computer interaction (HCI) and can be applied in affective computing, virtual support, and healthcare. This research presents a high-performance SER framework based on a lightweight 1D Convolutional Neural Network (1D-CNN) and a multi-feature fusion technique. Rather than employing spectrograms as image-based input, frame-level characteristics (Mel-Frequency Cepstral Coefficients, Mel-Spectrograms, and Chroma vectors) are calculated throughout the sequences to preserve temporal information and reduce the computing expense. The model attained classification accuracies of 94.0% on MELD (multi-party talks) and 91.9% on RAVDESS (acted speech). Ablation experiments demonstrate that the integration of complimentary features significantly outperforms the utilisation of a singular feature as a baseline. Data augmentation techniques, including Gaussian noise and time shifting, enhance model generalisation. The proposed method demonstrates significant potential for real-time emotion recognition using audio only in embedded or resource-constrained devices. | |
| 653 | |a Data augmentation | ||
| 653 | |a Accuracy | ||
| 653 | |a Embedded systems | ||
| 653 | |a Deep learning | ||
| 653 | |a Datasets | ||
| 653 | |a Wavelet transforms | ||
| 653 | |a Affective computing | ||
| 653 | |a Artificial intelligence | ||
| 653 | |a Human-computer interface | ||
| 653 | |a Emotion recognition | ||
| 653 | |a Spectrograms | ||
| 653 | |a Artificial neural networks | ||
| 653 | |a Neural networks | ||
| 653 | |a Ablation | ||
| 653 | |a Support vector machines | ||
| 653 | |a Random noise | ||
| 653 | |a Emotions | ||
| 653 | |a Methods | ||
| 653 | |a Acoustics | ||
| 653 | |a Real time | ||
| 653 | |a Speech | ||
| 653 | |a Speech recognition | ||
| 700 | 1 | |a Shaker, Shaimaa H | |
| 773 | 0 | |t Information |g vol. 16, no. 7 (2025), p. 518-536 | |
| 786 | 0 | |d ProQuest |t Advanced Technologies & Aerospace Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3233222654/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text + Graphics |u https://www.proquest.com/docview/3233222654/fulltextwithgraphics/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3233222654/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch |