Understanding the Reverse Engineering Abilities of Large Language Models

Zapisane w:

Opis bibliograficzny
Wydane w:	ProQuest Dissertations and Theses (2025)
1. autor:	Geng, Jiayi
Wydane:	ProQuest Dissertations & Theses
Hasła przedmiotowe:	Computer science Computer engineering Information science
Dostęp online:	Citation/Abstract Full Text - PDF
Etykiety:	Dodaj etykietę Nie ma etykietki, Dołącz pierwszą etykiete!

Opis
Streszczenie:	With the rapid development of large language models (LLMs), these models have demonstrated impressive capabilities not only in formal reasoning tasks but also certain desirable behaviors similar to human thinking. The emergence of these cognitive-like patterns motivated me to leverage insights from cognitive science to better understand the remaining challenges and explore the boundaries of the capabilities of LLMs. In this thesis, I mainly focus on the challenges that LLMs face when engaged in autonomous scientific processes. Using AI to create autonomous researchers has the potential to accelerate scientific discovery. A prerequisite for this vision to become reality is evaluating how well an AI model can identify the underlying structure of a system from its behavior. In this thesis, we explore the question of whether an LLM can learn from passive observations and actively collect informative data to refine its own hypotheses. To answer this question, we investigate the ability of LLMs to reverse-engineer three types of black-box systems chosen to represent problems that might appear in different domains of research: list mapping programs, formal languages, and mathematical equations. We use Bayesian models as a normative reference to quantify the gap between LLMs and optimal inference under a given observation space. Through extensive experiments, we show that while LLMs have difficulty reverse-engineering these systems from observations alone, data generated by LLM-driven interventions can effectively improve the models' own performance. By testing edge cases, the LLM is able to refine its own hypotheses and avoid failure modes such as overcomplication, where the LLM falsely assumes prior knowledge about the black box, and overlooking, where the LLM fails to incorporate observations. These insights provide practical guidance for helping LLMs more effectively reverse-engineer black-box systems, supporting their use in making new discoveries.
ISBN:	9798280750371
Źródło:	ProQuest Dissertations & Theses Global