Seeing is Believing: Vision-driven Non-crash Functional Bug Detection for Mobile Apps

Đã lưu trong:

Chi tiết về thư mục
Xuất bản năm:	arXiv.org (Dec 4, 2024), p. n/a
Tác giả chính:	Liu, Zhe
Tác giả khác:	Cheng, Li, Chen, Chunyang, Wang, Junjie, Chen, Mengzhuo, Wu, Boyu, Wang, Yawen, Hu, Jun, Wang, Qing
Được phát hành:	Cornell University Library, arXiv.org
Những chủ đề:	User interface Semantics Graphical user interface Vision Large language models Automation Applications programs Logic Ablation Software testing Mobile computing Effectiveness
Truy cập trực tuyến:	Citation/Abstract Full text outside of ProQuest
Các nhãn:	Thêm thẻ Không có thẻ, Là người đầu tiên thẻ bản ghi này!

MARC


LEADER	00000nab a2200000uu 4500
001	3075791285
003	UK-CbPIL
022			\|a 2331-8422
035			\|a 3075791285
045	0		\|b d20241204
100	1		\|a Liu, Zhe
245	1		\|a Seeing is Believing: Vision-driven Non-crash Functional Bug Detection for Mobile Apps
260			\|b Cornell University Library, arXiv.org \|c Dec 4, 2024
513			\|a Working Paper
520	3		\|a Mobile app GUI (Graphical User Interface) pages now contain rich visual information, with the visual semantics of each page helping users understand the application logic. However, these complex visual and functional logic present new challenges to software testing. Existing automated GUI testing methods, constrained by the lack of reliable testing oracles, are limited to detecting crash bugs with obvious abnormal signals. Consequently, many non-crash functional bugs, ranging from unexpected behaviors to logical errors, often evade detection by current techniques. While these non-crash functional bugs can exhibit visual cues that serve as potential testing oracles, they often entail a sequence of screenshots, and detecting them necessitates an understanding of the operational logic among GUI page transitions, which is challenging traditional techniques. Considering the remarkable performance of Multimodal Large Language Models (MLLM) in visual and language understanding, this paper proposes Trident, a novel vision-driven, multi-agent collaborative automated GUI testing approach for detecting non-crash functional bugs. It comprises three agents: Explorer, Monitor, and Detector, to guide the exploration, oversee the testing progress, and spot issues. We also address several challenges, i.e., align visual and textual information for MLLM input, achieve functionality-oriented exploration, and infer test oracles for non-crash bugs, to enhance the performance of functionality bug detection. We evaluate Trident on 590 non-crash bugs and compare it with 12 baselines, it can achieve more than 14%-112% and 108%-147% boost in average recall and precision compared with the best baseline. The ablation study further proves the contribution of each module. Moreover, Trident identifies 43 new bugs on Google Play, of which 31 have been fixed.
653			\|a User interface
653			\|a Semantics
653			\|a Graphical user interface
653			\|a Vision
653			\|a Large language models
653			\|a Automation
653			\|a Applications programs
653			\|a Logic
653			\|a Ablation
653			\|a Software testing
653			\|a Mobile computing
653			\|a Effectiveness
700	1		\|a Cheng, Li
700	1		\|a Chen, Chunyang
700	1		\|a Wang, Junjie
700	1		\|a Chen, Mengzhuo
700	1		\|a Wu, Boyu
700	1		\|a Wang, Yawen
700	1		\|a Hu, Jun
700	1		\|a Wang, Qing
773	0		\|t arXiv.org \|g (Dec 4, 2024), p. n/a
786	0		\|d ProQuest \|t Engineering Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3075791285/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u http://arxiv.org/abs/2407.03037