On the Real-World (In)Security of Large Language Model Systems: Threats and Detection

I tiakina i:
Ngā taipitopito rārangi puna kōrero
I whakaputaina i:ProQuest Dissertations and Theses (2025)
Kaituhi matua: Lin, Zilong
I whakaputaina:
ProQuest Dissertations & Theses
Ngā marau:
Urunga tuihono:Citation/Abstract
Full Text - PDF
Ngā Tūtohu: Tāpirihia he Tūtohu
Kāore He Tūtohu, Me noho koe te mea tuatahi ki te tūtohu i tēnei pūkete!

MARC

LEADER 00000nab a2200000uu 4500
001 3241748517
003 UK-CbPIL
020 |a 9798291544549 
035 |a 3241748517 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Lin, Zilong 
245 1 |a On the Real-World (In)Security of Large Language Model Systems: Threats and Detection 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a Large Language Models (i.e., LLMs), trained on huge amounts of data sourced from the web, are driving a wide range of applications across diverse services. However, the increasing exploitation of LLMs for malicious services is witnessing an uptick, amplifying the cyber threat landscape and posing questions about the trustworthiness of LLM technologies. Despite this growing risk, there has been little effort to understand this emerging cybercrime enabled by the exploitation of LLMs for malicious services (i.e., Mallas) and their development pipelines in terms of the magnitude, impact, and techniques. My research delves into the security implications of real-world malicious LLM applications and their development pipelines, which span from upstream components---including toxic datasets and uncensored LLMs---to downstream malicious applications.Beginning from the downstream, this dissertation presents the first systematic study of 212 real-world Mallas, uncovering their proliferation in underground marketplaces and exposing their operational modalities. Our study discloses the growing Malla ecosystem and its substantial impact on today’s public LLM services. In particular, we identified eight backend LLMs powering these Mallas, as well as 182 adversarial prompts designed to circumvent the safeguards of commercial LLM APIs. We further demystify the tactics employed by Mallas, including the abuse of uncensored LLMs (i.e., ULLMs) and the exploitation of public LLMs via jailbreak prompts.To better understand the ULLMs increasingly leveraged to power Mallas, this dissertation further presents the first systematic investigation of ULLMs. Identifying ULLMs among the vast number of open-source LLMs hosted on platforms like Hugging Face poses significant challenges. To overcome this, we model relationships among LLMs and their associated data---such as fine-tuning, merging, compression, and toxic dataset usage---using a knowledge graph. Leveraging semi-supervised graph deep learning, we discovered over 11,000 ULLMs from a small set of labeled examples and uncensored datasets. A closer analysis of these ULLMs reveals their alarming scale and usage. Some have been downloaded over a million times, with one over 19 million installs. These models---created through fine-tuning, merging, or compression of other models---often violating terms of use---are capable of generating harmful content, including hate speech, violence, erotic material, and malicious code. Evidence shows their integration into hundreds of malicious applications offering services like erotic role-play, child pornography, malicious code generation, and more. In addition, underground forums reveal criminals sharing ULLM-based techniques and scripts to build cheap alternatives to commercial malicious LLMs.Finally, this dissertation explores the poisoning of Wiki systems---a critical source of training data for LLMs. It introduces MAWSEO, a novel stealthy blackhat SEO technique that leverages adversarial revisions to subtly inject toxic or promotional content into Wiki articles. These attacks are designed to promote specific content, evade vandalism detectors, maintain semantic coherence, and avoid raising user suspicion. Our evaluation and user study demonstrate that MAWSEO can effectively and efficiently generate adversarial Wiki edits that bypass state-of-the-art and built-in Wiki vandalism detectors, while still delivering promotional content to users without triggering alarms. We also explored potential defenses, including coherence-based detection and adversarial training of vandalism classifiers within the Wiki ecosystem.Overall, this work enables a better understanding of the real-world exploitation and abuse of LLMs and their associated systems by cybercriminals, offering insights into strategies to counteract the cybercrimes targeting and leveraging LLMs. 
653 |a Computer science 
653 |a Computer engineering 
653 |a Information technology 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3241748517/abstract/embedded/IZYTEZ3DIR4FRXA2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3241748517/fulltextPDF/embedded/IZYTEZ3DIR4FRXA2?source=fedsrch