Seeing is Believing: Vision-driven Non-crash Functional Bug Detection for Mobile Apps

Đã lưu trong:
Chi tiết về thư mục
Xuất bản năm:arXiv.org (Dec 4, 2024), p. n/a
Tác giả chính: Liu, Zhe
Tác giả khác: Cheng, Li, Chen, Chunyang, Wang, Junjie, Chen, Mengzhuo, Wu, Boyu, Wang, Yawen, Hu, Jun, Wang, Qing
Được phát hành:
Cornell University Library, arXiv.org
Những chủ đề:
Truy cập trực tuyến:Citation/Abstract
Full text outside of ProQuest
Các nhãn: Thêm thẻ
Không có thẻ, Là người đầu tiên thẻ bản ghi này!

MARC

LEADER 00000nab a2200000uu 4500
001 3075791285
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3075791285 
045 0 |b d20241204 
100 1 |a Liu, Zhe 
245 1 |a Seeing is Believing: Vision-driven Non-crash Functional Bug Detection for Mobile Apps 
260 |b Cornell University Library, arXiv.org  |c Dec 4, 2024 
513 |a Working Paper 
520 3 |a Mobile app GUI (Graphical User Interface) pages now contain rich visual information, with the visual semantics of each page helping users understand the application logic. However, these complex visual and functional logic present new challenges to software testing. Existing automated GUI testing methods, constrained by the lack of reliable testing oracles, are limited to detecting crash bugs with obvious abnormal signals. Consequently, many non-crash functional bugs, ranging from unexpected behaviors to logical errors, often evade detection by current techniques. While these non-crash functional bugs can exhibit visual cues that serve as potential testing oracles, they often entail a sequence of screenshots, and detecting them necessitates an understanding of the operational logic among GUI page transitions, which is challenging traditional techniques. Considering the remarkable performance of Multimodal Large Language Models (MLLM) in visual and language understanding, this paper proposes Trident, a novel vision-driven, multi-agent collaborative automated GUI testing approach for detecting non-crash functional bugs. It comprises three agents: Explorer, Monitor, and Detector, to guide the exploration, oversee the testing progress, and spot issues. We also address several challenges, i.e., align visual and textual information for MLLM input, achieve functionality-oriented exploration, and infer test oracles for non-crash bugs, to enhance the performance of functionality bug detection. We evaluate Trident on 590 non-crash bugs and compare it with 12 baselines, it can achieve more than 14%-112% and 108%-147% boost in average recall and precision compared with the best baseline. The ablation study further proves the contribution of each module. Moreover, Trident identifies 43 new bugs on Google Play, of which 31 have been fixed. 
653 |a User interface 
653 |a Semantics 
653 |a Graphical user interface 
653 |a Vision 
653 |a Large language models 
653 |a Automation 
653 |a Applications programs 
653 |a Logic 
653 |a Ablation 
653 |a Software testing 
653 |a Mobile computing 
653 |a Effectiveness 
700 1 |a Cheng, Li 
700 1 |a Chen, Chunyang 
700 1 |a Wang, Junjie 
700 1 |a Chen, Mengzhuo 
700 1 |a Wu, Boyu 
700 1 |a Wang, Yawen 
700 1 |a Hu, Jun 
700 1 |a Wang, Qing 
773 0 |t arXiv.org  |g (Dec 4, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3075791285/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2407.03037