Building the Learning-From-Interaction Pipeline for Large Language Models

Gardado en:

Detalles Bibliográficos
Publicado en:	ProQuest Dissertations and Theses (2025)
Autor Principal:	Murty, Shikhar
Publicado:	ProQuest Dissertations & Theses
Materias:	Artificial intelligence Error analysis Large language models Computer engineering
Acceso en liña:	Citation/Abstract Full Text - PDF
Etiquetas:	Engadir etiqueta Sen Etiquetas, Sexa o primeiro en etiquetar este rexistro!

Descripción
Resumo:	LLMs have demonstrated remarkable capabilities, and there is growing interest in using them as agents—systems that can translate complex human goals, expressed in natural language, into sequences of actions within digital environments like web browsers. Achieving this requires two core competencies: first, the ability to understand arbitrary and compositional language inputs; and second, the capacity to learn about unfamiliar environments so that language goals can be grounded in effective, multi-step decision-making. This thesis addresses both of these challenges. In the first part, I introduce Tree Projections, a framework for understanding how transformers build compositional structure. I then present a series of results based on Tree Projections that illuminate the mechanisms behind compositional generalization, grokking, and sample-efficient learning in transformers. While Tree Projections help explain successful generalization, prior work has shown that standard transformers struggle with deep recursion due to a lack of mechanisms for unbounded hierarchical depth. To address this, I propose Pushdown Layers, an architectural augmentation that adds a stack-based memory to transformers. Pushdown Layers improve sample efficiency and generalization on tasks requiring nested or recursive reasoning. In the second party, I introduce NNetNav and BAGEL, methods for unsupervised, open-ended exploration in web environments that enable models to automatically collect training data for new websites, without human supervision. Our best results come from fine-tuning LLMs with demonstrations collected via NNetNav, which uses the hierarchical structure of language to guide exploration policies. Using NNetNav, we collect 10,000 demonstrations from 20 real-world websites and fine-tune an 8B model, setting a new state-of-the-art among unsupervised methods and outperforming zero-shot GPT-4 on multiple browser benchmarks. Taken together, these contributions bring us closer to digital language agents that can both handle the complexity of language instructions and autonomously learn from interacting with their environments.
ISBN:	9798288815157
Fonte:	ProQuest Dissertations & Theses Global