Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning

Đã lưu trong:

Chi tiết về thư mục
Xuất bản năm:	arXiv.org (Dec 20, 2024), p. n/a
Tác giả chính:	Park, Sungjin
Tác giả khác:	Liu, Xiao, Gong, Yeyun, Choi, Edward
Được phát hành:	Cornell University Library, arXiv.org
Những chủ đề:	Language Algorithms Large language models Decoding Task complexity Searching Reasoning
Truy cập trực tuyến:	Citation/Abstract Full text outside of ProQuest
Các nhãn:	Thêm thẻ Không có thẻ, Là người đầu tiên thẻ bản ghi này!

MARC


LEADER	00000nab a2200000uu 4500
001	3148681739
003	UK-CbPIL
022			\|a 2331-8422
035			\|a 3148681739
045	0		\|b d20241220
100	1		\|a Park, Sungjin
245	1		\|a Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning
260			\|b Cornell University Library, arXiv.org \|c Dec 20, 2024
513			\|a Working Paper
520	3		\|a Despite recent advances in large language models, open-source models often struggle to consistently perform well on complex reasoning tasks. Existing ensemble methods, whether applied at the token or output levels, fail to address these challenges. In response, we present Language model Ensemble with Monte Carlo Tree Search (LE-MCTS), a novel framework for process-level ensembling of language models. LE-MCTS formulates step-by-step reasoning with an ensemble of language models as a Markov decision process. In this framework, states represent intermediate reasoning paths, while actions consist of generating the next reasoning step using one of the language models selected from a predefined pool. Guided by a process-based reward model, LE-MCTS performs a tree search over the reasoning steps generated by different language models, identifying the most accurate reasoning chain. Experimental results on five mathematical reasoning benchmarks demonstrate that our approach outperforms both single language model decoding algorithms and language model ensemble methods. Notably, LE-MCTS improves performance by 3.6% and 4.3% on the MATH and MQA datasets, respectively, highlighting its effectiveness in solving complex reasoning problems.
653			\|a Language
653			\|a Algorithms
653			\|a Large language models
653			\|a Decoding
653			\|a Task complexity
653			\|a Searching
653			\|a Reasoning
700	1		\|a Liu, Xiao
700	1		\|a Gong, Yeyun
700	1		\|a Choi, Edward
773	0		\|t arXiv.org \|g (Dec 20, 2024), p. n/a
786	0		\|d ProQuest \|t Engineering Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3148681739/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u http://arxiv.org/abs/2412.15797