Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models

محفوظ في:
التفاصيل البيبلوغرافية
الحاوية / القاعدة:arXiv.org (Dec 8, 2024), p. n/a
المؤلف الرئيسي: Teng, Ma
مؤلفون آخرون: Jia Xiaojun, Duan Ranjie, Li, Xinfeng, Huang Yihao, Chu Zhixuan, Liu, Yang, Ren Wenqi
منشور في:
Cornell University Library, arXiv.org
الموضوعات:
الوصول للمادة أونلاين:Citation/Abstract
Full text outside of ProQuest
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

MARC

LEADER 00000nab a2200000uu 4500
001 3142728423
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3142728423 
045 0 |b d20241208 
100 1 |a Teng, Ma 
245 1 |a Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models 
260 |b Cornell University Library, arXiv.org  |c Dec 8, 2024 
513 |a Working Paper 
520 3 |a With the rapid advancement of multimodal large language models (MLLMs), concerns regarding their security have increasingly captured the attention of both academia and industry. Although MLLMs are vulnerable to jailbreak attacks, designing effective multimodal jailbreak attacks poses unique challenges, especially given the distinct protective measures implemented across various modalities in commercial models. Previous works concentrate risks into a single modality, resulting in limited jailbreak performance. In this paper, we propose a heuristic-induced multimodal risk distribution jailbreak attack method, called HIMRD, which consists of two elements: multimodal risk distribution strategy and heuristic-induced search strategy. The multimodal risk distribution strategy is used to segment harmful instructions across multiple modalities to effectively circumvent MLLMs' security protection. The heuristic-induced search strategy identifies two types of prompts: the understanding-enhancing prompt, which helps the MLLM reconstruct the malicious prompt, and the inducing prompt, which increases the likelihood of affirmative outputs over refusals, enabling a successful jailbreak attack. Extensive experiments demonstrate that this approach effectively uncovers vulnerabilities in MLLMs, achieving an average attack success rate of 90% across seven popular open-source MLLMs and an average attack success rate of around 68% in three popular closed-source MLLMs. Our code will coming soon. Warning: This paper contains offensive and harmful examples, reader discretion is advised. 
653 |a Heuristic 
653 |a Source code 
653 |a Large language models 
653 |a Security 
653 |a Search methods 
653 |a Risk 
700 1 |a Jia Xiaojun 
700 1 |a Duan Ranjie 
700 1 |a Li, Xinfeng 
700 1 |a Huang Yihao 
700 1 |a Chu Zhixuan 
700 1 |a Liu, Yang 
700 1 |a Ren Wenqi 
773 0 |t arXiv.org  |g (Dec 8, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3142728423/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.05934