Speech Emotion Recognition on MELD and RAVDESS Datasets Using CNN

Сохранить в:
Библиографические подробности
Опубликовано в::Information vol. 16, no. 7 (2025), p. 518-536
Главный автор: Waleed, Gheed T
Другие авторы: Shaker, Shaimaa H
Опубликовано:
MDPI AG
Предметы:
Online-ссылка:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Метки: Добавить метку
Нет меток, Требуется 1-ая метка записи!

MARC

LEADER 00000nab a2200000uu 4500
001 3233222654
003 UK-CbPIL
022 |a 2078-2489 
024 7 |a 10.3390/info16070518  |2 doi 
035 |a 3233222654 
045 2 |b d20250701  |b d20250731 
084 |a 231474  |2 nlm 
100 1 |a Waleed, Gheed T 
245 1 |a Speech Emotion Recognition on MELD and RAVDESS Datasets Using CNN 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a Speech emotion recognition (SER) plays a vital role in enhancing human–computer interaction (HCI) and can be applied in affective computing, virtual support, and healthcare. This research presents a high-performance SER framework based on a lightweight 1D Convolutional Neural Network (1D-CNN) and a multi-feature fusion technique. Rather than employing spectrograms as image-based input, frame-level characteristics (Mel-Frequency Cepstral Coefficients, Mel-Spectrograms, and Chroma vectors) are calculated throughout the sequences to preserve temporal information and reduce the computing expense. The model attained classification accuracies of 94.0% on MELD (multi-party talks) and 91.9% on RAVDESS (acted speech). Ablation experiments demonstrate that the integration of complimentary features significantly outperforms the utilisation of a singular feature as a baseline. Data augmentation techniques, including Gaussian noise and time shifting, enhance model generalisation. The proposed method demonstrates significant potential for real-time emotion recognition using audio only in embedded or resource-constrained devices. 
653 |a Data augmentation 
653 |a Accuracy 
653 |a Embedded systems 
653 |a Deep learning 
653 |a Datasets 
653 |a Wavelet transforms 
653 |a Affective computing 
653 |a Artificial intelligence 
653 |a Human-computer interface 
653 |a Emotion recognition 
653 |a Spectrograms 
653 |a Artificial neural networks 
653 |a Neural networks 
653 |a Ablation 
653 |a Support vector machines 
653 |a Random noise 
653 |a Emotions 
653 |a Methods 
653 |a Acoustics 
653 |a Real time 
653 |a Speech 
653 |a Speech recognition 
700 1 |a Shaker, Shaimaa H 
773 0 |t Information  |g vol. 16, no. 7 (2025), p. 518-536 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3233222654/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3233222654/fulltextwithgraphics/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3233222654/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch