Separate Anything You Describe

Сохранить в:
Библиографические подробности
Опубликовано в::arXiv.org (Dec 1, 2024), p. n/a
Главный автор: Liu, Xubo
Другие авторы: Kong, Qiuqiang, Zhao, Yan, Liu, Haohe, Yuan, Yi, Liu, Yuzhuo, Xia, Rui, Wang, Yuxuan, Plumbley, Mark D, Wang, Wenwu
Опубликовано:
Cornell University Library, arXiv.org
Предметы:
Online-ссылка:Citation/Abstract
Full text outside of ProQuest
Метки: Добавить метку
Нет меток, Требуется 1-ая метка записи!

MARC

LEADER 00000nab a2200000uu 4500
001 2848589922
003 UK-CbPIL
022 |a 2331-8422 
035 |a 2848589922 
045 0 |b d20241201 
100 1 |a Liu, Xubo 
245 1 |a Separate Anything You Describe 
260 |b Cornell University Library, arXiv.org  |c Dec 1, 2024 
513 |a Working Paper 
520 3 |a Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA). LASS aims to separate a target sound from an audio mixture given a natural language query, which provides a natural and scalable interface for digital audio applications. Recent works on LASS, despite attaining promising separation performance on specific sources (e.g., musical instruments, limited classes of audio events), are unable to separate audio concepts in the open domain. In this work, we introduce AudioSep, a foundation model for open-domain audio source separation with natural language queries. We train AudioSep on large-scale multimodal datasets and extensively evaluate its capabilities on numerous tasks including audio event separation, musical instrument separation, and speech enhancement. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability using audio captions or text labels as queries, substantially outperforming previous audio-queried and language-queried sound separation models. For reproducibility of this work, we will release the source code, evaluation benchmark and pre-trained model at: https://github.com/Audio-AGI/AudioSep. 
653 |a Speech processing 
653 |a Source code 
653 |a Queries 
653 |a Scene analysis 
653 |a Musical instruments 
653 |a Separation 
653 |a Query languages 
653 |a Natural language 
700 1 |a Kong, Qiuqiang 
700 1 |a Zhao, Yan 
700 1 |a Liu, Haohe 
700 1 |a Yuan, Yi 
700 1 |a Liu, Yuzhuo 
700 1 |a Xia, Rui 
700 1 |a Wang, Yuxuan 
700 1 |a Plumbley, Mark D 
700 1 |a Wang, Wenwu 
773 0 |t arXiv.org  |g (Dec 1, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/2848589922/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2308.05037