Chat-rgie: precision extraction of rice germplasm data using large language models and prompt engineering

Sparad:
Bibliografiska uppgifter
I publikationen:Journal of Big Data vol. 12, no. 1 (Aug 2025), p. 202
Huvudupphov: Wei, Yijin
Övriga upphov: Fan, Jingchao
Utgiven:
Springer Nature B.V.
Ämnen:
Länkar:Citation/Abstract
Full Text
Full Text - PDF
Taggar: Lägg till en tagg
Inga taggar, Lägg till första taggen!

MARC

LEADER 00000nab a2200000uu 4500
001 3241438213
003 UK-CbPIL
022 |a 2196-1115 
024 7 |a 10.1186/s40537-025-01236-0  |2 doi 
035 |a 3241438213 
045 2 |b d20250801  |b d20250831 
100 1 |a Wei, Yijin  |u Chinese Academy of Agricultural Sciences, Agriculture Information Institution, Beijing, China (GRID:grid.410727.7) (ISNI:0000 0001 0526 1937); National Agriculture Science Data Center, Beijing, China (GRID:grid.410727.7) 
245 1 |a Chat-rgie: precision extraction of rice germplasm data using large language models and prompt engineering 
260 |b Springer Nature B.V.  |c Aug 2025 
513 |a Journal Article 
520 3 |a Varietal improvement is a key aspect of breeding, and as a result of this work, crop varietal data becomes more complicated, requiring more resources to extract. As a result, we developed Chat-RGIE, a rice germplasm data extraction strategy based on conversational large language models (LLM) and cue word engineering, to achieve rice germplasm data extraction in a ZERO-shot manner. The technique employs multi-response voting to limit the chance of phantom appearances, as well as an additional calibration component to choose the best data extraction findings. We performed performance evaluation and real-life data extraction evaluation on Chat-RGIE, and the scheme obtained 0.9102 precision, 0.9941 recall, and 0.9554 accuracy in performance evaluation, and 0.6351 precision, 1.0 recall, and 0.8225 accuracy in real-life data extraction evaluation, which completely proved the effectiveness of the scheme. Furthermore, the well-designed data extraction procedure mitigates the likelihood of potential bias from a single large model leading to hallucinations to some extent, with the incidence of hallucinations in the two evaluations being 0.0015 and 0.005, respectively, with a very minor influence. Furthermore, we employed Restraint Rate, a statistic used to quantify the degree of limits placed by the prompt on LLM replies, with values of 0.9265 and 0.911 in the two evaluations, resulting in normative responses. Furthermore, when we examined the data extraction results, we discovered that when confronted with an unanswerable answer, the LLM is affected by the stress provided by the prompt, and the higher the stress, the more likely it is to engage in constraint-violating behavior, which is similar to what humans do when stressed. We therefore believe that some of the countermeasures in the human behavior in question also have the potential to help improve LLM performance. 
653 |a Language 
653 |a Recall 
653 |a Accuracy 
653 |a Performance evaluation 
653 |a Large language models 
653 |a Knowledge 
653 |a Rice 
653 |a Databases 
653 |a Extraction procedures 
653 |a Germplasm 
653 |a Natural language processing 
653 |a Prompt engineering 
653 |a Polymerase chain reaction 
653 |a Human behavior 
653 |a Big Data 
653 |a Extraction 
653 |a Conversation 
653 |a Models 
653 |a Chat 
653 |a Conversational strategies 
653 |a Engineering 
653 |a Evaluation 
653 |a Hallucinations 
653 |a Data 
653 |a Stress 
653 |a Behavior 
653 |a Language modeling 
700 1 |a Fan, Jingchao  |u Chinese Academy of Agricultural Sciences, Agriculture Information Institution, Beijing, China (GRID:grid.410727.7) (ISNI:0000 0001 0526 1937); National Agriculture Science Data Center, Beijing, China (GRID:grid.410727.7) 
773 0 |t Journal of Big Data  |g vol. 12, no. 1 (Aug 2025), p. 202 
786 0 |d ProQuest  |t ABI/INFORM Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3241438213/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3241438213/fulltext/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3241438213/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch