MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data
Sparad:
| I publikationen: | Scientific Data vol. 12, no. 1 (2025), p. 1392-1400 |
|---|---|
| Huvudupphov: | |
| Övriga upphov: | , , , |
| Utgiven: |
Nature Publishing Group
|
| Ämnen: | |
| Länkar: | Citation/Abstract Full Text Full Text - PDF |
| Taggar: |
Inga taggar, Lägg till första taggen!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3237859299 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2052-4463 | ||
| 024 | 7 | |a 10.1038/s41597-025-05283-3 |2 doi | |
| 035 | |a 3237859299 | ||
| 045 | 2 | |b d20250101 |b d20251231 | |
| 084 | |a 274935 |2 nlm | ||
| 100 | 1 | |a Fang, Meng |u Department of Computer Science, University of Liverpool, Liverpool, UK (ROR: https://ror.org/04xs57h96) (GRID: grid.10025.36) (ISNI: 0000 0004 1936 8470) | |
| 245 | 1 | |a MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data | |
| 260 | |b Nature Publishing Group |c 2025 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. Despite these successes, most LLMs still struggle with solving mathematical problems due to the intricate reasoning required. To support rigorous evaluation of mathematical reasoning in LLMs, we introduce the “MathOdyssey” dataset - a curated collection of 387 expert-generated mathematical problems spanning high school, university, and Olympiad-level topics. Each problem is accompanied by a detailed solution and categorized by difficulty level, subject area, and answer type. The dataset was developed through a rigorous multi-stage process involving contributions from subject experts, peer review, and standardized formatting. We provide detailed metadata and a standardized schema to facilitate consistent use in downstream applications. To demonstrate the dataset’s utility, we evaluate several representative LLMs and report their performance across problem types. We release MathOdyssey as an open-access resource to enable reproducible and fine-grained assessment of mathematical capabilities in LLMs and to foster further research in mathematical reasoning and education. | |
| 653 | |a Problem solving | ||
| 653 | |a Datasets | ||
| 653 | |a Artificial intelligence | ||
| 653 | |a Mathematical problems | ||
| 653 | |a Large language models | ||
| 653 | |a Language | ||
| 653 | |a Mathematical models | ||
| 653 | |a Prime numbers | ||
| 700 | 1 | |a Wan, Xiangpeng |u NetMind.AI, London, UK (ROR: https://ror.org/03knd6b36) (GRID: grid.497885.f) (ISNI: 0000 0000 9934 3724) | |
| 700 | 1 | |a Lu, Fei |u Department of Mathematics, Johns Hopkins University, Baltimore, MD, USA (ROR: https://ror.org/00za53h95) (GRID: grid.21107.35) (ISNI: 0000 0001 2171 9311) | |
| 700 | 1 | |a Xing, Fei |u Mathematica Policy Research, Princeton, New Jersey, USA (ROR: https://ror.org/02403vr89) (GRID: grid.419482.2) (ISNI: 0000 0004 0618 1906) | |
| 700 | 1 | |a Zou, Kai |u NetMind.AI, London, UK (ROR: https://ror.org/03knd6b36) (GRID: grid.497885.f) (ISNI: 0000 0000 9934 3724) | |
| 773 | 0 | |t Scientific Data |g vol. 12, no. 1 (2025), p. 1392-1400 | |
| 786 | 0 | |d ProQuest |t Health & Medical Collection | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3237859299/abstract/embedded/J7RWLIQ9I3C9JK51?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text |u https://www.proquest.com/docview/3237859299/fulltext/embedded/J7RWLIQ9I3C9JK51?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3237859299/fulltextPDF/embedded/J7RWLIQ9I3C9JK51?source=fedsrch |