ViUniT: Visual Unit Tests for More Robust Visual Programming

Gorde:
Xehetasun bibliografikoak
Argitaratua izan da:arXiv.org (Dec 12, 2024), p. n/a
Egile nagusia: Panagopoulou, Artemis
Beste egile batzuk: Zhou, Honglu, Savarese, Silvio, Xiong, Caiming, Callison-Burch, Chris, Yatskar, Mark, Niebles, Juan Carlos
Argitaratua:
Cornell University Library, arXiv.org
Gaiak:
Sarrera elektronikoa:Citation/Abstract
Full text outside of ProQuest
Etiketak: Etiketa erantsi
Etiketarik gabe, Izan zaitez lehena erregistro honi etiketa jartzen!

MARC

LEADER 00000nab a2200000uu 4500
001 3144199565
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3144199565 
045 0 |b d20241212 
100 1 |a Panagopoulou, Artemis 
245 1 |a ViUniT: Visual Unit Tests for More Robust Visual Programming 
260 |b Cornell University Library, arXiv.org  |c Dec 12, 2024 
513 |a Working Paper 
520 3 |a Programming based approaches to reasoning tasks have substantially expanded the types of questions models can answer about visual scenes. Yet on benchmark visual reasoning data, when models answer correctly, they produce incorrect programs 33% of the time. These models are often right for the wrong reasons and risk unexpected failures on new data. Unit tests play a foundational role in ensuring code correctness and could be used to repair such failures. We propose Visual Unit Testing (ViUniT), a framework to improve the reliability of visual programs by automatically generating unit tests. In our framework, a unit test is represented as a novel image and answer pair meant to verify the logical correctness of a program produced for a given query. Our method leverages a language model to create unit tests in the form of image descriptions and expected answers and image synthesis to produce corresponding images. We conduct a comprehensive analysis of what constitutes an effective visual unit test suite, exploring unit test generation, sampling strategies, image generation methods, and varying the number of programs and unit tests. Additionally, we introduce four applications of visual unit tests: best program selection, answer refusal, re-prompting, and unsupervised reward formulations for reinforcement learning. Experiments with two models across three datasets in visual question answering and image-text matching demonstrate that ViUniT improves model performance by 11.4%. Notably, it enables 7B open-source models to outperform gpt-4o-mini by an average of 7.7% and reduces the occurrence of programs that are correct for the wrong reasons by 40%. 
653 |a Visual tasks 
653 |a Questions 
653 |a Image processing 
653 |a Query languages 
653 |a Reasoning 
653 |a Software testing 
700 1 |a Zhou, Honglu 
700 1 |a Savarese, Silvio 
700 1 |a Xiong, Caiming 
700 1 |a Callison-Burch, Chris 
700 1 |a Yatskar, Mark 
700 1 |a Niebles, Juan Carlos 
773 0 |t arXiv.org  |g (Dec 12, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3144199565/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.08859