Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

Guardado en:
Detalles Bibliográficos
Publicado en:arXiv.org (Jun 2, 2024), p. n/a
Autor principal: Li, Yunheng
Otros Autores: Li, ZhongYu, Zeng, Quansheng, Hou, Qibin, Ming-Ming, Cheng
Publicado:
Cornell University Library, arXiv.org
Materias:
Acceso en línea:Citation/Abstract
Full text outside of ProQuest
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3064391836
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3064391836 
045 0 |b d20240602 
100 1 |a Li, Yunheng 
245 1 |a Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation 
260 |b Cornell University Library, arXiv.org  |c Jun 2, 2024 
513 |a Working Paper 
520 3 |a Pre-trained vision-language models, e.g., CLIP, have been successfully applied to zero-shot semantic segmentation. Existing CLIP-based approaches primarily utilize visual features from the last layer to align with text embeddings, while they neglect the crucial information in intermediate layers that contain rich object details. However, we find that directly aggregating the multi-level visual features weakens the zero-shot ability for novel classes. The large differences between the visual features from different layers make these features hard to align well with the text embeddings. We resolve this problem by introducing a series of independent decoders to align the multi-level visual features with the text embeddings in a cascaded way, forming a novel but simple framework named Cascade-CLIP. Our Cascade-CLIP is flexible and can be easily applied to existing zero-shot semantic segmentation methods. Experimental results show that our simple Cascade-CLIP achieves superior zero-shot performance on segmentation benchmarks, like COCO-Stuff, Pascal-VOC, and Pascal-Context. Our code is available at: https://github.com/HVision-NKU/Cascade-CLIP 
653 |a Decoders 
653 |a Vision 
653 |a Semantic segmentation 
653 |a Pascal (programming language) 
653 |a Semantics 
700 1 |a Li, ZhongYu 
700 1 |a Zeng, Quansheng 
700 1 |a Hou, Qibin 
700 1 |a Ming-Ming, Cheng 
773 0 |t arXiv.org  |g (Jun 2, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3064391836/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2406.00670