Diffusion Models for Open-Vocabulary Segmentation

সংরক্ষণ করুন:

গ্রন্থ-পঞ্জীর বিবরন
প্রকাশিত:	arXiv.org (Sep 30, 2024), p. n/a
প্রধান লেখক:	Karazija, Laurynas
অন্যান্য লেখক:	Iro Laina, Vedaldi, Andrea, Rupprecht, Christian
প্রকাশিত:	Cornell University Library, arXiv.org
বিষয়গুলি:	Feature extraction Image segmentation Ambiguity Pascal (programming language) Benchmarks Training
অনলাইন ব্যবহার করুন:	Citation/Abstract Full text outside of ProQuest
ট্যাগগুলো:	ট্যাগ যুক্ত করুন কোনো ট্যাগ নেই, প্রথমজন হিসাবে ট্যাগ করুন!

বিবরন
সার সংক্ষেপ:	Open-vocabulary segmentation is the task of segmenting anything that can be named in an image. Recently, large-scale vision-language modelling has led to significant advances in open-vocabulary segmentation, but at the cost of gargantuan and increasing training and annotation efforts. Hence, we ask if it is possible to use existing foundation models to synthesise on-demand efficient segmentation algorithms for specific class sets, making them applicable in an open-vocabulary setting without the need to collect further data, annotations or perform training. To that end, we present OVDiff, a novel method that leverages generative text-to-image diffusion models for unsupervised open-vocabulary segmentation. OVDiff synthesises support image sets for arbitrary textual categories, creating for each a set of prototypes representative of both the category and its surrounding context (background). It relies solely on pre-trained components and outputs the synthesised segmenter directly, without training. Our approach shows strong performance on a range of benchmarks, obtaining a lead of more than 5% over prior work on PASCAL VOC.
আইএসএসএন:	2331-8422
সম্পদ:	Engineering Database