AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation
Αποθηκεύτηκε σε:
| Εκδόθηκε σε: | arXiv.org (Dec 13, 2024), p. n/a |
|---|---|
| Κύριος συγγραφέας: | |
| Άλλοι συγγραφείς: | , , , , , |
| Έκδοση: |
Cornell University Library, arXiv.org
|
| Θέματα: | |
| Διαθέσιμο Online: | Citation/Abstract Full text outside of ProQuest |
| Ετικέτες: |
Δεν υπάρχουν, Καταχωρήστε ετικέτα πρώτοι!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3145273898 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2331-8422 | ||
| 035 | |a 3145273898 | ||
| 045 | 0 | |b d20241213 | |
| 100 | 1 | |a Gao, Xiyuan | |
| 245 | 1 | |a AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation | |
| 260 | |b Cornell University Library, arXiv.org |c Dec 13, 2024 | ||
| 513 | |a Working Paper | ||
| 520 | 3 | |a Detecting sarcasm effectively requires a nuanced understanding of context, including vocal tones and facial expressions. The progression towards multimodal computational methods in sarcasm detection, however, faces challenges due to the scarcity of data. To address this, we present AMuSeD (Attentive deep neural network for MUltimodal Sarcasm dEtection incorporating bi-modal Data augmentation). This approach utilizes the Multimodal Sarcasm Detection Dataset (MUStARD) and introduces a two-phase bimodal data augmentation strategy. The first phase involves generating varied text samples through Back Translation from several secondary languages. The second phase involves the refinement of a FastSpeech 2-based speech synthesis system, tailored specifically for sarcasm to retain sarcastic intonations. Alongside a cloud-based Text-to-Speech (TTS) service, this Fine-tuned FastSpeech 2 system produces corresponding audio for the text augmentations. We also investigate various attention mechanisms for effectively merging text and audio data, finding self-attention to be the most efficient for bimodal integration. Our experiments reveal that this combined augmentation and attention approach achieves a significant F1-score of 81.0% in text-audio modalities, surpassing even models that use three modalities from the MUStARD dataset. | |
| 653 | |a Datasets | ||
| 653 | |a Data augmentation | ||
| 653 | |a Audio data | ||
| 653 | |a Modal data | ||
| 653 | |a Mustard | ||
| 653 | |a Artificial neural networks | ||
| 653 | |a Neural networks | ||
| 653 | |a Speech recognition | ||
| 700 | 1 | |a Bansal, Shubhi | |
| 700 | 1 | |a Gowda, Kushaan | |
| 700 | 1 | |a Zhu, Li | |
| 700 | 1 | |a Nayak, Shekhar | |
| 700 | 1 | |a Kumar, Nagendra | |
| 700 | 1 | |a Coler, Matt | |
| 773 | 0 | |t arXiv.org |g (Dec 13, 2024), p. n/a | |
| 786 | 0 | |d ProQuest |t Engineering Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3145273898/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch |
| 856 | 4 | 0 | |3 Full text outside of ProQuest |u http://arxiv.org/abs/2412.10103 |