ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements

Guardado en:
Detalles Bibliográficos
Publicado en:arXiv.org (Nov 18, 2024), p. n/a
Autor principal: Aydın, M Arda
Otros Autores: Efe Mert Çırpar, Abdinli, Elvin, Unal, Gozde, Sahin, Yusuf H
Publicado:
Cornell University Library, arXiv.org
Materias:
Acceso en línea:Citation/Abstract
Full text outside of ProQuest
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3130965881
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3130965881 
045 0 |b d20241118 
100 1 |a Aydın, M Arda 
245 1 |a ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements 
260 |b Cornell University Library, arXiv.org  |c Nov 18, 2024 
513 |a Working Paper 
520 3 |a Recent advances in foundational Vision Language Models (VLMs) have reshaped the evaluation paradigm in computer vision tasks. These foundational models, especially CLIP, have accelerated research in open-vocabulary computer vision tasks, including Open-Vocabulary Semantic Segmentation (OVSS). Although the initial results are promising, the dense prediction capabilities of VLMs still require further improvement. In this study, we enhance the semantic segmentation performance of CLIP by introducing new modules and modifications: 1) architectural changes in the last layer of ViT and the incorporation of attention maps from the middle layers with the last layer, 2) Image Engineering: applying data augmentations to enrich input image representations, and 3) using Large Language Models (LLMs) to generate definitions and synonyms for each class name to leverage CLIP's open-vocabulary capabilities. Our training-free method, ITACLIP, outperforms current state-of-the-art approaches on segmentation benchmarks such as COCO-Stuff, COCO-Object, Pascal Context, and Pascal VOC. Our code is available at https://github.com/m-arda-aydn/ITACLIP. 
653 |a Data augmentation 
653 |a Computer vision 
653 |a Semantic segmentation 
653 |a Large language models 
653 |a Image enhancement 
653 |a Image segmentation 
653 |a Pascal (programming language) 
700 1 |a Efe Mert Çırpar 
700 1 |a Abdinli, Elvin 
700 1 |a Unal, Gozde 
700 1 |a Sahin, Yusuf H 
773 0 |t arXiv.org  |g (Nov 18, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3130965881/abstract/embedded/H09TXR3UUZB2ISDL?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2411.12044