PRefLexOR: preference-based recursive language modeling for exploratory optimization of reasoning and agentic thinking

Guardat en:
Dades bibliogràfiques
Publicat a:NPJ Artificial Intelligence vol. 1, no. 1 (Dec 2025), p. 4
Autor principal: Buehler, Markus J.
Publicat:
Nature Publishing Group
Matèries:
Accés en línia:Citation/Abstract
Full Text
Full Text - PDF
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC

LEADER 00000nab a2200000uu 4500
001 3227648399
003 UK-CbPIL
022 |a 3005-1460 
024 7 |a 10.1038/s44387-025-00003-z  |2 doi 
035 |a 3227648399 
045 2 |b d20251201  |b d20251231 
100 1 |a Buehler, Markus J.  |u Massachusetts Institute of Technology, Center for Computational Science and Engineering, Schmarzman College of Computing, Laboratory for Atomistic and Molecular Mechanics (LAMM), Cambridge, USA (GRID:grid.116068.8) (ISNI:0000 0001 2341 2786) 
245 1 |a PRefLexOR: preference-based recursive language modeling for exploratory optimization of reasoning and agentic thinking 
260 |b Nature Publishing Group  |c Dec 2025 
513 |a Journal Article 
520 3 |a We introduce PRefLexOR (Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning), a framework that integrates preference optimization with reinforcement learning (RL) concepts for self-improving scientific reasoning. PRefLexOR employs a recursive approach, refining intermediate steps before producing final outputs in training and inference. It optimizes log odds between preferred and non-preferred responses using an in-situ dataset generation algorithm. A dynamic knowledge graph contextualizes reasoning with retrieval-augmented data. Preference optimization enhances performance via rejection sampling, masking reasoning steps to focus on discovery. Recursive optimization, guided by feedback loops, refines reasoning. This process mirrors biological adaptation, enabling real-time learning. We find that even small models (3B parameters) self-teach deeper reasoning, solving open-domain problems effectively. Our method integrates into existing LLMs and demonstrates success in biological materials science, leveraging multi-agent self-improvement for enhanced reasoning depth and cross-domain adaptability, offering flexibility and integration into larger agentic systems. 
653 |a Data augmentation 
653 |a Partial differential equations 
653 |a Materials science 
653 |a Datasets 
653 |a Biological materials 
653 |a Artificial intelligence 
653 |a Modelling 
653 |a Optimization techniques 
653 |a Knowledge 
653 |a Feedback loops 
653 |a Interdisciplinary aspects 
653 |a Reasoning 
653 |a Optimization 
653 |a Preferences 
653 |a Biomedical materials 
653 |a Multiagent systems 
653 |a Machine learning 
653 |a Biological activity 
653 |a Informatics 
653 |a Real time 
653 |a Large language models 
653 |a Knowledge representation 
653 |a Natural language 
653 |a Recursive methods 
773 0 |t NPJ Artificial Intelligence  |g vol. 1, no. 1 (Dec 2025), p. 4 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3227648399/abstract/embedded/J7RWLIQ9I3C9JK51?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3227648399/fulltext/embedded/J7RWLIQ9I3C9JK51?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3227648399/fulltextPDF/embedded/J7RWLIQ9I3C9JK51?source=fedsrch