PRefLexOR: preference-based recursive language modeling for exploratory optimization of reasoning and agentic thinking
Guardat en:
| Publicat a: | NPJ Artificial Intelligence vol. 1, no. 1 (Dec 2025), p. 4 |
|---|---|
| Autor principal: | |
| Publicat: |
Nature Publishing Group
|
| Matèries: | |
| Accés en línia: | Citation/Abstract Full Text Full Text - PDF |
| Etiquetes: |
Sense etiquetes, Sigues el primer a etiquetar aquest registre!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3227648399 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 3005-1460 | ||
| 024 | 7 | |a 10.1038/s44387-025-00003-z |2 doi | |
| 035 | |a 3227648399 | ||
| 045 | 2 | |b d20251201 |b d20251231 | |
| 100 | 1 | |a Buehler, Markus J. |u Massachusetts Institute of Technology, Center for Computational Science and Engineering, Schmarzman College of Computing, Laboratory for Atomistic and Molecular Mechanics (LAMM), Cambridge, USA (GRID:grid.116068.8) (ISNI:0000 0001 2341 2786) | |
| 245 | 1 | |a PRefLexOR: preference-based recursive language modeling for exploratory optimization of reasoning and agentic thinking | |
| 260 | |b Nature Publishing Group |c Dec 2025 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a We introduce PRefLexOR (Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning), a framework that integrates preference optimization with reinforcement learning (RL) concepts for self-improving scientific reasoning. PRefLexOR employs a recursive approach, refining intermediate steps before producing final outputs in training and inference. It optimizes log odds between preferred and non-preferred responses using an in-situ dataset generation algorithm. A dynamic knowledge graph contextualizes reasoning with retrieval-augmented data. Preference optimization enhances performance via rejection sampling, masking reasoning steps to focus on discovery. Recursive optimization, guided by feedback loops, refines reasoning. This process mirrors biological adaptation, enabling real-time learning. We find that even small models (3B parameters) self-teach deeper reasoning, solving open-domain problems effectively. Our method integrates into existing LLMs and demonstrates success in biological materials science, leveraging multi-agent self-improvement for enhanced reasoning depth and cross-domain adaptability, offering flexibility and integration into larger agentic systems. | |
| 653 | |a Data augmentation | ||
| 653 | |a Partial differential equations | ||
| 653 | |a Materials science | ||
| 653 | |a Datasets | ||
| 653 | |a Biological materials | ||
| 653 | |a Artificial intelligence | ||
| 653 | |a Modelling | ||
| 653 | |a Optimization techniques | ||
| 653 | |a Knowledge | ||
| 653 | |a Feedback loops | ||
| 653 | |a Interdisciplinary aspects | ||
| 653 | |a Reasoning | ||
| 653 | |a Optimization | ||
| 653 | |a Preferences | ||
| 653 | |a Biomedical materials | ||
| 653 | |a Multiagent systems | ||
| 653 | |a Machine learning | ||
| 653 | |a Biological activity | ||
| 653 | |a Informatics | ||
| 653 | |a Real time | ||
| 653 | |a Large language models | ||
| 653 | |a Knowledge representation | ||
| 653 | |a Natural language | ||
| 653 | |a Recursive methods | ||
| 773 | 0 | |t NPJ Artificial Intelligence |g vol. 1, no. 1 (Dec 2025), p. 4 | |
| 786 | 0 | |d ProQuest |t Engineering Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3227648399/abstract/embedded/J7RWLIQ9I3C9JK51?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text |u https://www.proquest.com/docview/3227648399/fulltext/embedded/J7RWLIQ9I3C9JK51?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3227648399/fulltextPDF/embedded/J7RWLIQ9I3C9JK51?source=fedsrch |