Attention-Driven Time-Domain Convolutional Network for Source Separation of Vocal and Accompaniment

Gorde:
Xehetasun bibliografikoak
Argitaratua izan da:Electronics vol. 14, no. 20 (2025), p. 3982-4009
Egile nagusia: Zhao, Zhili
Beste egile batzuk: Luo, Min, Qiao Xiaoman, Shao Changheng, Sun Rencheng
Argitaratua:
MDPI AG
Gaiak:
Sarrera elektronikoa:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiketak: Etiketa erantsi
Etiketarik gabe, Izan zaitez lehena erregistro honi etiketa jartzen!

MARC

LEADER 00000nab a2200000uu 4500
001 3265895189
003 UK-CbPIL
022 |a 2079-9292 
024 7 |a 10.3390/electronics14203982  |2 doi 
035 |a 3265895189 
045 2 |b d20250101  |b d20251231 
084 |a 231458  |2 nlm 
100 1 |a Zhao, Zhili  |u College of Computer Science and Technology, Qingdao University, Qingdao 266071, China; 2018204625@qdu.edu.cn (Z.Z.); sch_mail@163.com (C.S.); src@qdu.edu.cn (R.S.) 
245 1 |a Attention-Driven Time-Domain Convolutional Network for Source Separation of Vocal and Accompaniment 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a Time-domain signal models have been widely applied to single-channel music source separation tasks due to their ability to overcome the limitations of fixed spectral representations and phase information loss. However, the high acoustic similarity and synchronous temporal evolution between vocals and accompaniment make accurate separation challenging for existing time-domain models. These challenges are mainly reflected in two aspects: (1) the lack of a dynamic mechanism to evaluate the contribution of each source during feature fusion, and (2) difficulty in capturing fine-grained temporal details, often resulting in local artifacts in the output. To address these issues, we propose an attention-driven time-domain convolutional network for vocal and accompaniment source separation. Specifically, we design an embedding attention module to perform adaptive source weighting, enabling the network to emphasize components more relevant to the target mask during training. In addition, an efficient convolutional block attention module is developed to enhance local feature extraction. This module integrates an efficient channel attention mechanism based on one-dimensional convolution while preserving spatial attention, thereby improving the ability to learn discriminative features from the target audio. Comprehensive evaluations on public music datasets demonstrate the effectiveness of the proposed model and its significant improvements over existing approaches. 
653 |a Feature extraction 
653 |a Musical instruments 
653 |a Machine learning 
653 |a Music 
653 |a Deep learning 
653 |a Musicians & conductors 
653 |a Separation 
653 |a Musical performances 
653 |a Neural networks 
653 |a Time domain analysis 
653 |a Attention 
653 |a Modules 
653 |a Target masking 
653 |a Acoustics 
653 |a Singers 
653 |a Information retrieval 
700 1 |a Luo, Min  |u Arts College, Qingdao University, Qingdao 266071, China 
700 1 |a Qiao Xiaoman  |u Information Technology Department, Qingdao Vocational and Technical College of Hotel Management, Qingdao 266100, China; 2020020653@qdu.edu.cn 
700 1 |a Shao Changheng  |u College of Computer Science and Technology, Qingdao University, Qingdao 266071, China; 2018204625@qdu.edu.cn (Z.Z.); sch_mail@163.com (C.S.); src@qdu.edu.cn (R.S.) 
700 1 |a Sun Rencheng  |u College of Computer Science and Technology, Qingdao University, Qingdao 266071, China; 2018204625@qdu.edu.cn (Z.Z.); sch_mail@163.com (C.S.); src@qdu.edu.cn (R.S.) 
773 0 |t Electronics  |g vol. 14, no. 20 (2025), p. 3982-4009 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3265895189/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3265895189/fulltextwithgraphics/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3265895189/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch