MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data

Sparad:
Bibliografiska uppgifter
I publikationen:Scientific Data vol. 12, no. 1 (2025), p. 1392-1400
Huvudupphov: Fang, Meng
Övriga upphov: Wan, Xiangpeng, Lu, Fei, Xing, Fei, Zou, Kai
Utgiven:
Nature Publishing Group
Ämnen:
Länkar:Citation/Abstract
Full Text
Full Text - PDF
Taggar: Lägg till en tagg
Inga taggar, Lägg till första taggen!

MARC

LEADER 00000nab a2200000uu 4500
001 3237859299
003 UK-CbPIL
022 |a 2052-4463 
024 7 |a 10.1038/s41597-025-05283-3  |2 doi 
035 |a 3237859299 
045 2 |b d20250101  |b d20251231 
084 |a 274935  |2 nlm 
100 1 |a Fang, Meng  |u Department of Computer Science, University of Liverpool, Liverpool, UK (ROR: https://ror.org/04xs57h96) (GRID: grid.10025.36) (ISNI: 0000 0004 1936 8470) 
245 1 |a MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data 
260 |b Nature Publishing Group  |c 2025 
513 |a Journal Article 
520 3 |a Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. Despite these successes, most LLMs still struggle with solving mathematical problems due to the intricate reasoning required. To support rigorous evaluation of mathematical reasoning in LLMs, we introduce the “MathOdyssey” dataset - a curated collection of 387 expert-generated mathematical problems spanning high school, university, and Olympiad-level topics. Each problem is accompanied by a detailed solution and categorized by difficulty level, subject area, and answer type. The dataset was developed through a rigorous multi-stage process involving contributions from subject experts, peer review, and standardized formatting. We provide detailed metadata and a standardized schema to facilitate consistent use in downstream applications. To demonstrate the dataset’s utility, we evaluate several representative LLMs and report their performance across problem types. We release MathOdyssey as an open-access resource to enable reproducible and fine-grained assessment of mathematical capabilities in LLMs and to foster further research in mathematical reasoning and education. 
653 |a Problem solving 
653 |a Datasets 
653 |a Artificial intelligence 
653 |a Mathematical problems 
653 |a Large language models 
653 |a Language 
653 |a Mathematical models 
653 |a Prime numbers 
700 1 |a Wan, Xiangpeng  |u NetMind.AI, London, UK (ROR: https://ror.org/03knd6b36) (GRID: grid.497885.f) (ISNI: 0000 0000 9934 3724) 
700 1 |a Lu, Fei  |u Department of Mathematics, Johns Hopkins University, Baltimore, MD, USA (ROR: https://ror.org/00za53h95) (GRID: grid.21107.35) (ISNI: 0000 0001 2171 9311) 
700 1 |a Xing, Fei  |u Mathematica Policy Research, Princeton, New Jersey, USA (ROR: https://ror.org/02403vr89) (GRID: grid.419482.2) (ISNI: 0000 0004 0618 1906) 
700 1 |a Zou, Kai  |u NetMind.AI, London, UK (ROR: https://ror.org/03knd6b36) (GRID: grid.497885.f) (ISNI: 0000 0000 9934 3724) 
773 0 |t Scientific Data  |g vol. 12, no. 1 (2025), p. 1392-1400 
786 0 |d ProQuest  |t Health & Medical Collection 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3237859299/abstract/embedded/J7RWLIQ9I3C9JK51?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3237859299/fulltext/embedded/J7RWLIQ9I3C9JK51?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3237859299/fulltextPDF/embedded/J7RWLIQ9I3C9JK51?source=fedsrch