On the Real-World (In)Security of Large Language Model Systems: Threats and Detection

Guardado en:

Detalles Bibliográficos
Publicado en:	ProQuest Dissertations and Theses (2025)
Autor principal:	Lin, Zilong
Publicado:	ProQuest Dissertations & Theses
Materias:	Computer science Computer engineering Information technology
Acceso en línea:	Citation/Abstract Full Text - PDF
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Descripción
Resumen:	Large Language Models (i.e., LLMs), trained on huge amounts of data sourced from the web, are driving a wide range of applications across diverse services. However, the increasing exploitation of LLMs for malicious services is witnessing an uptick, amplifying the cyber threat landscape and posing questions about the trustworthiness of LLM technologies. Despite this growing risk, there has been little effort to understand this emerging cybercrime enabled by the exploitation of LLMs for malicious services (i.e., Mallas) and their development pipelines in terms of the magnitude, impact, and techniques. My research delves into the security implications of real-world malicious LLM applications and their development pipelines, which span from upstream components---including toxic datasets and uncensored LLMs---to downstream malicious applications.Beginning from the downstream, this dissertation presents the first systematic study of 212 real-world Mallas, uncovering their proliferation in underground marketplaces and exposing their operational modalities. Our study discloses the growing Malla ecosystem and its substantial impact on today’s public LLM services. In particular, we identified eight backend LLMs powering these Mallas, as well as 182 adversarial prompts designed to circumvent the safeguards of commercial LLM APIs. We further demystify the tactics employed by Mallas, including the abuse of uncensored LLMs (i.e., ULLMs) and the exploitation of public LLMs via jailbreak prompts.To better understand the ULLMs increasingly leveraged to power Mallas, this dissertation further presents the first systematic investigation of ULLMs. Identifying ULLMs among the vast number of open-source LLMs hosted on platforms like Hugging Face poses significant challenges. To overcome this, we model relationships among LLMs and their associated data---such as fine-tuning, merging, compression, and toxic dataset usage---using a knowledge graph. Leveraging semi-supervised graph deep learning, we discovered over 11,000 ULLMs from a small set of labeled examples and uncensored datasets. A closer analysis of these ULLMs reveals their alarming scale and usage. Some have been downloaded over a million times, with one over 19 million installs. These models---created through fine-tuning, merging, or compression of other models---often violating terms of use---are capable of generating harmful content, including hate speech, violence, erotic material, and malicious code. Evidence shows their integration into hundreds of malicious applications offering services like erotic role-play, child pornography, malicious code generation, and more. In addition, underground forums reveal criminals sharing ULLM-based techniques and scripts to build cheap alternatives to commercial malicious LLMs.Finally, this dissertation explores the poisoning of Wiki systems---a critical source of training data for LLMs. It introduces MAWSEO, a novel stealthy blackhat SEO technique that leverages adversarial revisions to subtly inject toxic or promotional content into Wiki articles. These attacks are designed to promote specific content, evade vandalism detectors, maintain semantic coherence, and avoid raising user suspicion. Our evaluation and user study demonstrate that MAWSEO can effectively and efficiently generate adversarial Wiki edits that bypass state-of-the-art and built-in Wiki vandalism detectors, while still delivering promotional content to users without triggering alarms. We also explored potential defenses, including coherence-based detection and adversarial training of vandalism classifiers within the Wiki ecosystem.Overall, this work enables a better understanding of the real-world exploitation and abuse of LLMs and their associated systems by cybercriminals, offering insights into strategies to counteract the cybercrimes targeting and leveraging LLMs.
ISBN:	9798291544549
Fuente:	ProQuest Dissertations & Theses Global