PRefLexOR: preference-based recursive language modeling for exploratory optimization of reasoning and agentic thinking

Gardado en:
Detalles Bibliográficos
Publicado en:NPJ Artificial Intelligence vol. 1, no. 1 (Dec 2025), p. 4
Autor Principal: Buehler, Markus J.
Publicado:
Nature Publishing Group
Materias:
Acceso en liña:Citation/Abstract
Full Text
Full Text - PDF
Etiquetas: Engadir etiqueta
Sen Etiquetas, Sexa o primeiro en etiquetar este rexistro!
Descripción
Resumo:We introduce PRefLexOR (Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning), a framework that integrates preference optimization with reinforcement learning (RL) concepts for self-improving scientific reasoning. PRefLexOR employs a recursive approach, refining intermediate steps before producing final outputs in training and inference. It optimizes log odds between preferred and non-preferred responses using an in-situ dataset generation algorithm. A dynamic knowledge graph contextualizes reasoning with retrieval-augmented data. Preference optimization enhances performance via rejection sampling, masking reasoning steps to focus on discovery. Recursive optimization, guided by feedback loops, refines reasoning. This process mirrors biological adaptation, enabling real-time learning. We find that even small models (3B parameters) self-teach deeper reasoning, solving open-domain problems effectively. Our method integrates into existing LLMs and demonstrates success in biological materials science, leveraging multi-agent self-improvement for enhanced reasoning depth and cross-domain adaptability, offering flexibility and integration into larger agentic systems.
ISSN:3005-1460
DOI:10.1038/s44387-025-00003-z
Fonte:Engineering Database