Comparison of biomedical relationship extraction methods and models for knowledge graph creation

AI-generated keywords: Biomedical Research Knowledge Graphs Machine Learning Transformers Drug Discovery

AI-generated Key Points

Biomedical research is expanding rapidly
Knowledge graphs offer a framework for organizing and validating biomedical knowledge from literature
Rule-based and machine learning-based methods are compared for relationship extraction from biomedical literature
Transformer-based models perform well on small and unbalanced datasets
PubMedBERT-based model achieves the highest F1-score of 0.92, followed closely by DistilBERT with an F1-score of 0.89
BERT-based models outperform T5-based generative models in this context
Researchers struggle to cope with the volume of biomedical literature and need tools to find relevant articles and validate claims
Information retrieval approaches like PubMed and Quertle provide a list of relevant articles but do not validate hypotheses or claims
Extracting named relationships from biomedical literature can contribute to building a large knowledge graph connecting entities through various relationships
Future work involves creating a comprehensive biomedical knowledge graph for target identification, indication expansion, and drug discovery.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Nikola Milosevic, Wolfgang Thielemann

Nikola Milosevic, Wolfgang Thielemann, Comparison of biomedical relationship extraction methods and models for knowledge graph creation, Journal of Web Semantics, 2022, 100756, ISSN 1570-8268,

arXiv: 2201.01647v4 - DOI (cs.AI)

Paper submitted to Journal of Semantic Web

License: CC BY-SA 4.0

Abstract: Biomedical research is growing at such an exponential pace that scientists, researchers, and practitioners are no more able to cope with the amount of published literature in the domain. The knowledge presented in the literature needs to be systematized in such a way that claims and hypotheses can be easily found, accessed, and validated. Knowledge graphs can provide such a framework for semantic knowledge representation from literature. However, in order to build a knowledge graph, it is necessary to extract knowledge as relationships between biomedical entities and normalize both entities and relationship types. In this paper, we present and compare few rule-based and machine learning-based (Naive Bayes, Random Forests as examples of traditional machine learning methods and DistilBERT, PubMedBERT, T5 and SciFive-based models as examples of modern deep learning transformers) methods for scalable relationship extraction from biomedical literature, and for the integration into the knowledge graphs. We examine how resilient are these various methods to unbalanced and fairly small datasets. Our experiments show that transformer-based models handle well both small (due to pre-training on a large dataset) and unbalanced datasets. The best performing model was the PubMedBERT-based model fine-tuned on balanced data, with a reported F1-score of 0.92. DistilBERT-based model followed with F1-score of 0.89, performing faster and with lower resource requirements. BERT-based models performed better then T5-based generative models.

Submitted to arXiv on 05 Jan. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2201.01647v4

Comprehensive Summary
Key points
Layman's Summary
Blog article

Biomedical research is expanding rapidly, leading to an overwhelming amount of published literature. To effectively utilize this knowledge, it needs to be organized and validated. Knowledge graphs offer a framework for representing semantic knowledge from literature by extracting relationships between biomedical entities and normalizing them. This paper compares rule-based and machine learning-based methods for scalable relationship extraction from biomedical literature, including traditional machine learning methods (Naive Bayes, Random Forests) and modern deep learning transformers (DistilBERT, PubMedBERT, T5, SciFive). The study examines the resilience of these methods to unbalanced and small datasets. Experimental results show that transformer-based models perform well on both small and unbalanced datasets. The PubMedBERT-based model fine-tuned on balanced data achieves the highest F1-score of 0.92, while the DistilBERT-based model follows closely with an F1-score of 0.89. BERT-based models outperform T5-based generative models in this context. The introduction highlights the exponential growth of biomedical literature, with over 950,000 articles added to Medline in 2020 alone. Researchers struggle to cope with this volume of information and require tools to find relevant articles and validate claims and hypotheses. Information retrieval approaches have been developed for bio-medicine such as PubMed and Quertle but they only provide a list of relevant articles without validating hypotheses or claims. To validate hypotheses or claims researchers need to read through significant amounts of literature manually; however these hypotheses can often be summarized as relationships between concepts in simple sentences (e.g., "Aspirin treats pain"). Extracting these named relationships from biomedical literature can contribute to building a large knowledge graph where entities are connected through various relationships. The future work section discusses the long term task of creating a comprehensive biomedical knowledge graph for target identification indication expansion and drug discovery.

- Biomedical research is expanding rapidly
- Knowledge graphs offer a framework for organizing and validating biomedical knowledge from literature
- Rule-based and machine learning-based methods are compared for relationship extraction from biomedical literature
- Transformer-based models perform well on small and unbalanced datasets
- PubMedBERT-based model achieves the highest F1-score of 0.92, followed closely by DistilBERT with an F1-score of 0.89
- BERT-based models outperform T5-based generative models in this context
- Researchers struggle to cope with the volume of biomedical literature and need tools to find relevant articles and validate claims
- Information retrieval approaches like PubMed and Quertle provide a list of relevant articles but do not validate hypotheses or claims
- Extracting named relationships from biomedical literature can contribute to building a large knowledge graph connecting entities through various relationships
- Future work involves creating a comprehensive biomedical knowledge graph for target identification, indication expansion, and drug discovery.

Biomedical research is growing quickly. This means that scientists are learning more about how our bodies work and finding new ways to help people stay healthy. Knowledge graphs are like big maps that organize and check what scientists have learned from books and articles. Scientists use different methods, like rules and machines, to find important information in these books and articles. Some computer programs called Transformer-based models are really good at finding information from small or unbalanced groups of books and articles. One program called PubMedBERT is the best at finding important information, followed closely by another program called DistilBERT. These programs are better than other programs that try to create new ideas based on what they read. Scientists need tools to help them find the right books and articles for their research, but also to make sure the things they learn are true. One way they do this is by making a big map of all the things they learn, connecting them together like puzzle pieces. In the future, scientists want to make an even bigger map of everything they know about medicine so they can find new treatments for diseases." Definitions- Biomedical research: The study of how our bodies work and finding new ways to help people stay healthy. - Knowledge graphs: Big maps that organize and check what scientists have learned from books and articles. - Rule-based methods: Using specific rules or instructions to find important information in books and articles. - Machine learning-based methods: Using computers or machines to find important information in books and articles. - Transformer-based models

Understanding the Growing Need for Knowledge Graphs in Biomedical Research

Biomedical research is growing at an exponential rate, with over 950,000 articles added to Medline in 2020 alone. This overwhelming amount of published literature makes it difficult for researchers to effectively utilize this knowledge. To address this challenge, a framework is needed to organize and validate the information from biomedical literature. Knowledge graphs offer a promising solution by extracting relationships between biomedical entities and normalizing them into a structured format.

Comparing Rule-Based and Machine Learning-Based Methods for Relationship Extraction

This paper compares rule-based and machine learning-based methods for scalable relationship extraction from biomedical literature. Traditional machine learning methods such as Naive Bayes and Random Forests are compared against modern deep learning transformers like DistilBERT, PubMedBERT, T5, SciFive. The study examines the resilience of these methods to unbalanced and small datasets.

Experimental Results

The experimental results show that transformer-based models perform well on both small and unbalanced datasets. The PubMedBERT-based model fine-tuned on balanced data achieves the highest F1-score of 0.92 while the DistilBERT based model follows closely with an F1 score of 0.89. BERT based models outperform T5 based generative models in this context which shows that transformer based approaches are more suitable than traditional ML approaches when dealing with unbalanced or small datasets in biomedical research applications..

Applications of Knowledge Graphs in Biomedical Research

The introduction highlights the need for tools to find relevant articles quickly without having to manually read through significant amounts of literature; however these hypotheses can often be summarized as relationships between concepts in simple sentences (e.g., "Aspirin treats pain"). Extracting these named relationships from biomedical literature can contribute to building a large knowledge graph where entities are connected through various relationships which can then be used for target identification indication expansion and drug discovery among other applications..

Future Work

The future work section discusses the long term task of creating a comprehensive biomedical knowledge graph which could enable automated hypothesis validation using natural language processing (NLP) techniques such as question answering systems or summarization algorithms applied on top of existing knowledge graphs.. In conclusion, this paper provides valuable insights into how rule-based and machine learning-based methods can be used together to extract meaningful relationships from biomedical literature efficiently while also being resilient against unbalanced or small datasets . Transformer based models have been shown to outperform traditional ML approaches making them more suitable when dealing with such challenges . Finally , further research should focus on creating comprehensive knowledge graphs that could enable automated hypothesis validation using NLP techniques applied on top of existing knowledge graphs .

Created on 10 Sep. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

57.7%

Structured information extraction from complex scientific text with fine-tune…

cs.CL

57.5%

Spark NLP: Natural Language Understanding at Scale

cs.CL

56.7%

Zero is Not Hero Yet: Benchmarking Zero-Shot Performance of LLMs for Financia…

cs.CL

56.6%

Question Answering Survey: Directions, Challenges, Datasets, Evaluation Matri…

cs.CL

56.2%

Towards Expert-Level Medical Question Answering with Large Language Models

cs.CL

56.1%

BERT: A Review of Applications in Natural Language Processing and Understandi…

cs.CL

55.6%

Common human diseases prediction using machine learning based on survey data

cs.LG

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.