Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering

Na minha lista:

Detalhes bibliográficos
Publicado no:	arXiv.org (Oct 31, 2024), p. n/a
Autor principal:	Pusch, Larissa
Outros Autores:	Conrad, Tim O F
Publicado em:	Cornell University Library, arXiv.org
Assuntos:	Questions Accessibility Accuracy Source code Graphs System reliability Information systems Large language models Queries Prompt engineering Error reduction Biomedical data Natural language processing Knowledge representation Natural language
Acesso em linha:	Citation/Abstract Full text outside of ProQuest
Tags:	Adicionar Tag Sem tags, seja o primeiro a adicionar uma tag!

MARC


LEADER	00000nab a2200000uu 4500
001	3123151613
003	UK-CbPIL
022			\|a 2331-8422
035			\|a 3123151613
045	0		\|b d20241031
100	1		\|a Pusch, Larissa
245	1		\|a Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering
260			\|b Cornell University Library, arXiv.org \|c Oct 31, 2024
513			\|a Working Paper
520	3		\|a Advancements in natural language processing have revolutionized the way we can interact with digital information systems, such as databases, making them more accessible. However, challenges persist, especially when accuracy is critical, as in the biomedical domain. A key issue is the hallucination problem, where models generate information unsupported by the underlying data, potentially leading to dangerous misinformation. This paper presents a novel approach designed to bridge this gap by combining Large Language Models (LLM) and Knowledge Graphs (KG) to improve the accuracy and reliability of question-answering systems, on the example of a biomedical KG. Built on the LangChain framework, our method incorporates a query checker that ensures the syntactical and semantic validity of LLM-generated queries, which are then used to extract information from a Knowledge Graph, substantially reducing errors like hallucinations. We evaluated the overall performance using a new benchmark dataset of 50 biomedical questions, testing several LLMs, including GPT-4 Turbo and llama3:70b. Our results indicate that while GPT-4 Turbo outperforms other models in generating accurate queries, open-source models like llama3:70b show promise with appropriate prompt engineering. To make this approach accessible, a user-friendly web-based interface has been developed, allowing users to input natural language queries, view generated and corrected Cypher queries, and verify the resulting paths for accuracy. Overall, this hybrid approach effectively addresses common issues such as data gaps and hallucinations, offering a reliable and intuitive solution for question answering systems. The source code for generating the results of this paper and for the user-interface can be found in our Git repository: https://git.zib.de/lpusch/cyphergenkg-gui
653			\|a Questions
653			\|a Accessibility
653			\|a Accuracy
653			\|a Source code
653			\|a Graphs
653			\|a System reliability
653			\|a Information systems
653			\|a Large language models
653			\|a Queries
653			\|a Prompt engineering
653			\|a Error reduction
653			\|a Biomedical data
653			\|a Natural language processing
653			\|a Knowledge representation
653			\|a Natural language
700	1		\|a Conrad, Tim O F
773	0		\|t arXiv.org \|g (Oct 31, 2024), p. n/a
786	0		\|d ProQuest \|t Engineering Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3123151613/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u http://arxiv.org/abs/2409.04181