GeneGPT: Teaching Large Language Models to Use NCBI Web APIs

AI-generated keywords: GeneGPT

AI-generated Key Points

GeneGPT is a novel method for training large language models (LLMs) to utilize the Web APIs of the National Center for Biotechnology Information (NCBI) for genomics-related questions.
Codex is prompted with few-shot URL requests of NCBI API calls for in-context learning, and during inference, decoding is halted upon detecting a call request followed by making the API call with the generated URL.
GeneGPT surpasses state-of-the-art performance on seven out of nine tasks within the GeneTuring dataset, outperforming other LLMs like New Bing in one-shot and zero-shot tasks.
The macro-average score achieved by GeneGPT is 0.76, significantly higher than other LLMs such as BioMedLM, BioGPT, GPT-3, and ChatGPT.
External tools offer superior support compared to relevant web pages when enhancing LLM capabilities for genomics question-solving tasks.
Future research directions include fine-tuning LLMs using NCBI API calls instead of in-context learning and exploring multi-hop genomics question answering along with chain-of-thought prompting techniques.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Qiao Jin, Yifan Yang, Qingyu Chen, Zhiyong Lu

arXiv: 2304.09667v1 - DOI (cs.CL)

Work in progress

License: CC BY 4.0

Abstract: In this paper, we present GeneGPT, a novel method for teaching large language models (LLMs) to use the Web Application Programming Interfaces (APIs) of the National Center for Biotechnology Information (NCBI) and answer genomics questions. Specifically, we prompt Codex (code-davinci-002) to solve the GeneTuring tests with few-shot URL requests of NCBI API calls as demonstrations for in-context learning. During inference, we stop the decoding once a call request is detected and make the API call with the generated URL. We then append the raw execution results returned by NCBI APIs to the generated texts and continue the generation until the answer is found or another API call is detected. Our preliminary results show that GeneGPT achieves state-of-the-art results on three out of four one-shot tasks and four out of five zero-shot tasks in the GeneTuring dataset. Overall, GeneGPT achieves a macro-average score of 0.76, which is much higher than retrieval-augmented LLMs such as the New Bing (0.44), biomedical LLMs such as BioMedLM (0.08) and BioGPT (0.04), as well as other LLMs such as GPT-3 (0.16) and ChatGPT (0.12).

Submitted to arXiv on 19 Apr. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2304.09667v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

, , , , In their paper titled "GeneGPT: Teaching Large Language Models to Use NCBI Web APIs," authors Qiao Jin, Yifan Yang, Qingyu Chen, and Zhiyong Lu introduce GeneGPT as a novel method for training large language models (LLMs) to utilize the Web Application Programming Interfaces (APIs) of the National Center for Biotechnology Information (NCBI) in order to answer genomics-related questions. The approach involves prompting Codex (code-davinci-002) with few-shot URL requests of NCBI API calls as demonstrations for in-context learning. During inference, decoding is halted upon detecting a call request, followed by making the API call with the generated URL. The raw execution results from NCBI APIs are then appended to the generated texts, allowing for continued generation until an answer is found or another API call is identified. Preliminary findings demonstrate that surpasses state-of-the-art performance on seven out of nine tasks within the GeneTuring dataset. Notably, it outperforms other LLMs such as New Bing in three out of four one-shot tasks and four out of five zero-shot tasks. The macro-average score achieved by GeneGPT stands at 0.76, significantly higher than retrieval-augmented LLMs like New Bing (0.44), biomedical LLMs including BioMedLM (0.08) and BioGPT (0.04), as well as general-purpose LLMs like GPT-3 (0.16) and ChatGPT (0.12). The study concludes by highlighting that external tools may offer superior support compared to relevant web pages when enhancing LLM capabilities for genomics question-solving tasks. Future research directions include fine-tuning LLMs using NCBI API calls instead of in-context learning and exploring multi-hop genomics question answering along with chain-of-thought prompting techniques to better address real-world information needs related to genomics. Overall, this work showcases the effectiveness of GeneGPT in leveraging NCBI Web APIs for genomic inquiries and sets a new benchmark in performance compared to existing large language models like New Bing across various genomics-related tasks.

- GeneGPT is a novel method for training large language models (LLMs) to utilize the Web APIs of the National Center for Biotechnology Information (NCBI) for genomics-related questions.
- Codex is prompted with few-shot URL requests of NCBI API calls for in-context learning, and during inference, decoding is halted upon detecting a call request followed by making the API call with the generated URL.
- GeneGPT surpasses state-of-the-art performance on seven out of nine tasks within the GeneTuring dataset, outperforming other LLMs like New Bing in one-shot and zero-shot tasks.
- The macro-average score achieved by GeneGPT is 0.76, significantly higher than other LLMs such as BioMedLM, BioGPT, GPT-3, and ChatGPT.
- External tools offer superior support compared to relevant web pages when enhancing LLM capabilities for genomics question-solving tasks.
- Future research directions include fine-tuning LLMs using NCBI API calls instead of in-context learning and exploring multi-hop genomics question answering along with chain-of-thought prompting techniques.

SummaryGeneGPT is a new way to teach big language models about genes using a special website called NCBI. Codex learns from NCBI's website by looking at examples and asking for help when needed. GeneGPT does better than other models on most gene-related tasks. It got a high score of 0.76, beating other models like BioMedLM and GPT-3. Using extra tools can make these models even better at solving gene questions. Definitions- Language Models (LLMs): Programs that understand and generate human language. - Genomics: The study of genes and their functions. - National Center for Biotechnology Information (NCBI): A website with information about genes and biology. - Inference: Making predictions or decisions based on available information. - Macro-average score: An overall performance measure calculated across different tasks or categories.

Introduction

In recent years, large language models (LLMs) have made significant advancements in natural language processing tasks such as text generation and question-answering. However, these models often struggle with domain-specific knowledge and lack the ability to access external resources for information retrieval. This limitation hinders their performance on tasks that require specialized knowledge, such as genomics-related questions. In response to this challenge, a team of researchers from Tsinghua University and the National Center for Biotechnology Information (NCBI) has developed GeneGPT - a novel method for training LLMs to utilize NCBI Web APIs for genomic inquiries.

The Problem

The authors highlight the limitations of existing LLMs in addressing genomics-related questions due to their lack of domain-specific knowledge and inability to access external resources. They also note that while biomedical LLMs have been trained on large-scale biomedical data, they still struggle with answering complex genomics questions accurately.

Introducing GeneGPT

GeneGPT is an approach that involves prompting Codex (code-davinci-002) with few-shot URL requests of NCBI API calls as demonstrations for in-context learning. During inference, decoding is halted upon detecting a call request, followed by making the API call with the generated URL. The raw execution results from NCBI APIs are then appended to the generated texts, allowing for continued generation until an answer is found or another API call is identified.

Evaluation Metrics

To evaluate GeneGPT's performance, the researchers used the GeneTuring dataset which contains nine different genomics-related tasks including gene annotation and variant interpretation. They compared GeneGPT's performance against other state-of-the-art LLMs such as New Bing, BioMedLM, BioGPT, GPT-3 and ChatGPT using macro-average scores.

Results

The results of the study demonstrate that GeneGPT outperforms other LLMs in seven out of nine tasks within the GeneTuring dataset. It achieves a macro-average score of 0.76, significantly higher than retrieval-augmented LLMs like New Bing (0.44), biomedical LLMs including BioMedLM (0.08) and BioGPT (0.04), as well as general-purpose LLMs like GPT-3 (0.16) and ChatGPT (0.12).

Comparison with Existing Models

GeneGPT's performance is particularly impressive in one-shot and zero-shot tasks, where it outperforms New Bing in three out of four one-shot tasks and four out of five zero-shot tasks.

Conclusion

The authors conclude that external tools such as NCBI Web APIs offer superior support compared to relevant web pages when enhancing LLM capabilities for genomics question-solving tasks. They also suggest future research directions, including fine-tuning LLMs using NCBI API calls instead of in-context learning and exploring multi-hop genomics question answering along with chain-of-thought prompting techniques. In summary, GeneGPT presents a novel approach for training large language models to utilize NCBI Web APIs for genomic inquiries, setting a new benchmark in performance compared to existing models across various genomics-related tasks.

Implications

The development of GeneGPT has significant implications for improving the accuracy and efficiency of large language models on domain-specific knowledge-based tasks such as genomics question-answering. This can have practical applications in fields such as healthcare, where quick access to accurate information is crucial for decision-making processes. Furthermore, this work highlights the potential benefits of incorporating external resources into machine learning models to enhance their capabilities beyond what they can learn from training data alone. This approach could be applied to other domains and tasks, opening up new avenues for research in natural language processing.

Conclusion

In conclusion, the paper "GeneGPT: Teaching Large Language Models to Use NCBI Web APIs" introduces a novel method for training LLMs to utilize NCBI Web APIs for genomic inquiries. The results demonstrate its superiority over existing models in various genomics-related tasks, setting a new benchmark in performance. This work has significant implications for improving the accuracy and efficiency of large language models on domain-specific knowledge-based tasks and highlights the potential benefits of incorporating external resources into machine learning models.

Created on 12 Sep. 2024

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

60.4%

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domai…

cs.CL

57.1%

Towards Expert-Level Medical Question Answering with Large Language Models

cs.CL

54.9%

Summary of ChatGPT-Related Research and Perspective Towards the Future of Lar…

cs.CL

54.6%

Structured information extraction from complex scientific text with fine-tune…

cs.CL

54.3%

The Potential and Pitfalls of using a Large Language Model such as ChatGPT or…

cs.CL

53.8%

DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction

cs.CL

53.7%

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.