Describir: PRefLexOR: preference-based recursive language modeling for exploratory optimization of reasoning and agentic thinking