Understanding the Reverse Engineering Abilities of Large Language Models

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ProQuest Dissertations and Theses (2025)
1. Verfasser: Geng, Jiayi
Veröffentlicht:
ProQuest Dissertations & Theses
Schlagworte:
Online-Zugang:Citation/Abstract
Full Text - PDF
Tags: Tag hinzufügen
Keine Tags, Fügen Sie das erste Tag hinzu!
Beschreibung
Abstract:With the rapid development of large language models (LLMs), these models have demonstrated impressive capabilities not only in formal reasoning tasks but also certain desirable behaviors similar to human thinking. The emergence of these cognitive-like patterns motivated me to leverage insights from cognitive science to better understand the remaining challenges and explore the boundaries of the capabilities of LLMs. In this thesis, I mainly focus on the challenges that LLMs face when engaged in autonomous scientific processes. Using AI to create autonomous researchers has the potential to accelerate scientific discovery. A prerequisite for this vision to become reality is evaluating how well an AI model can identify the underlying structure of a system from its behavior. In this thesis, we explore the question of whether an LLM can learn from passive observations and actively collect informative data to refine its own hypotheses. To answer this question, we investigate the ability of LLMs to reverse-engineer three types of black-box systems chosen to represent problems that might appear in different domains of research: list mapping programs, formal languages, and mathematical equations. We use Bayesian models as a normative reference to quantify the gap between LLMs and optimal inference under a given observation space. Through extensive experiments, we show that while LLMs have difficulty reverse-engineering these systems from observations alone, data generated by LLM-driven interventions can effectively improve the models' own performance. By testing edge cases, the LLM is able to refine its own hypotheses and avoid failure modes such as overcomplication, where the LLM falsely assumes prior knowledge about the black box, and overlooking, where the LLM fails to incorporate observations. These insights provide practical guidance for helping LLMs more effectively reverse-engineer black-box systems, supporting their use in making new discoveries.
ISBN:9798280750371
Quelle:ProQuest Dissertations & Theses Global