, , , ,
In their paper titled "GeneGPT: Teaching Large Language Models to Use NCBI Web APIs," authors Qiao Jin, Yifan Yang, Qingyu Chen, and Zhiyong Lu introduce GeneGPT as a novel method for training large language models (LLMs) to utilize the Web Application Programming Interfaces (APIs) of the National Center for Biotechnology Information (NCBI) in order to answer genomics-related questions. The approach involves prompting Codex (code-davinci-002) with few-shot URL requests of NCBI API calls as demonstrations for in-context learning. During inference, decoding is halted upon detecting a call request, followed by making the API call with the generated URL. The raw execution results from NCBI APIs are then appended to the generated texts, allowing for continued generation until an answer is found or another API call is identified. Preliminary findings demonstrate that surpasses state-of-the-art performance on seven out of nine tasks within the GeneTuring dataset. Notably, it outperforms other LLMs such as New Bing in three out of four one-shot tasks and four out of five zero-shot tasks. The macro-average score achieved by GeneGPT stands at 0.76, significantly higher than retrieval-augmented LLMs like New Bing (0.44), biomedical LLMs including BioMedLM (0.08) and BioGPT (0.04), as well as general-purpose LLMs like GPT-3 (0.16) and ChatGPT (0.12). The study concludes by highlighting that external tools may offer superior support compared to relevant web pages when enhancing LLM capabilities for genomics question-solving tasks. Future research directions include fine-tuning LLMs using NCBI API calls instead of in-context learning and exploring multi-hop genomics question answering along with chain-of-thought prompting techniques to better address real-world information needs related to genomics. Overall, this work showcases the effectiveness of GeneGPT in leveraging NCBI Web APIs for genomic inquiries and sets a new benchmark in performance compared to existing large language models like New Bing across various genomics-related tasks.
- - GeneGPT is a novel method for training large language models (LLMs) to utilize the Web APIs of the National Center for Biotechnology Information (NCBI) for genomics-related questions.
- - Codex is prompted with few-shot URL requests of NCBI API calls for in-context learning, and during inference, decoding is halted upon detecting a call request followed by making the API call with the generated URL.
- - GeneGPT surpasses state-of-the-art performance on seven out of nine tasks within the GeneTuring dataset, outperforming other LLMs like New Bing in one-shot and zero-shot tasks.
- - The macro-average score achieved by GeneGPT is 0.76, significantly higher than other LLMs such as BioMedLM, BioGPT, GPT-3, and ChatGPT.
- - External tools offer superior support compared to relevant web pages when enhancing LLM capabilities for genomics question-solving tasks.
- - Future research directions include fine-tuning LLMs using NCBI API calls instead of in-context learning and exploring multi-hop genomics question answering along with chain-of-thought prompting techniques.
SummaryGeneGPT is a new way to teach big language models about genes using a special website called NCBI. Codex learns from NCBI's website by looking at examples and asking for help when needed. GeneGPT does better than other models on most gene-related tasks. It got a high score of 0.76, beating other models like BioMedLM and GPT-3. Using extra tools can make these models even better at solving gene questions.
Definitions- Language Models (LLMs): Programs that understand and generate human language.
- Genomics: The study of genes and their functions.
- National Center for Biotechnology Information (NCBI): A website with information about genes and biology.
- Inference: Making predictions or decisions based on available information.
- Macro-average score: An overall performance measure calculated across different tasks or categories.
Introduction
In recent years, large language models (LLMs) have made significant advancements in natural language processing tasks such as text generation and question-answering. However, these models often struggle with domain-specific knowledge and lack the ability to access external resources for information retrieval. This limitation hinders their performance on tasks that require specialized knowledge, such as genomics-related questions. In response to this challenge, a team of researchers from Tsinghua University and the National Center for Biotechnology Information (NCBI) has developed GeneGPT - a novel method for training LLMs to utilize NCBI Web APIs for genomic inquiries.
The Problem
The authors highlight the limitations of existing LLMs in addressing genomics-related questions due to their lack of domain-specific knowledge and inability to access external resources. They also note that while biomedical LLMs have been trained on large-scale biomedical data, they still struggle with answering complex genomics questions accurately.
Introducing GeneGPT
GeneGPT is an approach that involves prompting Codex (code-davinci-002) with few-shot URL requests of NCBI API calls as demonstrations for in-context learning. During inference, decoding is halted upon detecting a call request, followed by making the API call with the generated URL. The raw execution results from NCBI APIs are then appended to the generated texts, allowing for continued generation until an answer is found or another API call is identified.
Evaluation Metrics
To evaluate GeneGPT's performance, the researchers used the GeneTuring dataset which contains nine different genomics-related tasks including gene annotation and variant interpretation. They compared GeneGPT's performance against other state-of-the-art LLMs such as New Bing, BioMedLM, BioGPT, GPT-3 and ChatGPT using macro-average scores.
Results
The results of the study demonstrate that GeneGPT outperforms other LLMs in seven out of nine tasks within the GeneTuring dataset. It achieves a macro-average score of 0.76, significantly higher than retrieval-augmented LLMs like New Bing (0.44), biomedical LLMs including BioMedLM (0.08) and BioGPT (0.04), as well as general-purpose LLMs like GPT-3 (0.16) and ChatGPT (0.12).
Comparison with Existing Models
GeneGPT's performance is particularly impressive in one-shot and zero-shot tasks, where it outperforms New Bing in three out of four one-shot tasks and four out of five zero-shot tasks.
Conclusion
The authors conclude that external tools such as NCBI Web APIs offer superior support compared to relevant web pages when enhancing LLM capabilities for genomics question-solving tasks. They also suggest future research directions, including fine-tuning LLMs using NCBI API calls instead of in-context learning and exploring multi-hop genomics question answering along with chain-of-thought prompting techniques.
In summary, GeneGPT presents a novel approach for training large language models to utilize NCBI Web APIs for genomic inquiries, setting a new benchmark in performance compared to existing models across various genomics-related tasks.
Implications
The development of GeneGPT has significant implications for improving the accuracy and efficiency of large language models on domain-specific knowledge-based tasks such as genomics question-answering. This can have practical applications in fields such as healthcare, where quick access to accurate information is crucial for decision-making processes.
Furthermore, this work highlights the potential benefits of incorporating external resources into machine learning models to enhance their capabilities beyond what they can learn from training data alone. This approach could be applied to other domains and tasks, opening up new avenues for research in natural language processing.
Conclusion
In conclusion, the paper "GeneGPT: Teaching Large Language Models to Use NCBI Web APIs" introduces a novel method for training LLMs to utilize NCBI Web APIs for genomic inquiries. The results demonstrate its superiority over existing models in various genomics-related tasks, setting a new benchmark in performance. This work has significant implications for improving the accuracy and efficiency of large language models on domain-specific knowledge-based tasks and highlights the potential benefits of incorporating external resources into machine learning models.