Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning

Đã lưu trong:
Chi tiết về thư mục
Xuất bản năm:arXiv.org (Dec 20, 2024), p. n/a
Tác giả chính: Park, Sungjin
Tác giả khác: Liu, Xiao, Gong, Yeyun, Choi, Edward
Được phát hành:
Cornell University Library, arXiv.org
Những chủ đề:
Truy cập trực tuyến:Citation/Abstract
Full text outside of ProQuest
Các nhãn: Thêm thẻ
Không có thẻ, Là người đầu tiên thẻ bản ghi này!

MARC

LEADER 00000nab a2200000uu 4500
001 3148681739
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3148681739 
045 0 |b d20241220 
100 1 |a Park, Sungjin 
245 1 |a Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning 
260 |b Cornell University Library, arXiv.org  |c Dec 20, 2024 
513 |a Working Paper 
520 3 |a Despite recent advances in large language models, open-source models often struggle to consistently perform well on complex reasoning tasks. Existing ensemble methods, whether applied at the token or output levels, fail to address these challenges. In response, we present Language model Ensemble with Monte Carlo Tree Search (LE-MCTS), a novel framework for process-level ensembling of language models. LE-MCTS formulates step-by-step reasoning with an ensemble of language models as a Markov decision process. In this framework, states represent intermediate reasoning paths, while actions consist of generating the next reasoning step using one of the language models selected from a predefined pool. Guided by a process-based reward model, LE-MCTS performs a tree search over the reasoning steps generated by different language models, identifying the most accurate reasoning chain. Experimental results on five mathematical reasoning benchmarks demonstrate that our approach outperforms both single language model decoding algorithms and language model ensemble methods. Notably, LE-MCTS improves performance by 3.6% and 4.3% on the MATH and MQA datasets, respectively, highlighting its effectiveness in solving complex reasoning problems. 
653 |a Language 
653 |a Algorithms 
653 |a Large language models 
653 |a Decoding 
653 |a Task complexity 
653 |a Searching 
653 |a Reasoning 
700 1 |a Liu, Xiao 
700 1 |a Gong, Yeyun 
700 1 |a Choi, Edward 
773 0 |t arXiv.org  |g (Dec 20, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3148681739/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.15797