Comparison of biomedical relationship extraction methods and models for knowledge graph creation
AI-generated Key Points
- Biomedical research is expanding rapidly
- Knowledge graphs offer a framework for organizing and validating biomedical knowledge from literature
- Rule-based and machine learning-based methods are compared for relationship extraction from biomedical literature
- Transformer-based models perform well on small and unbalanced datasets
- PubMedBERT-based model achieves the highest F1-score of 0.92, followed closely by DistilBERT with an F1-score of 0.89
- BERT-based models outperform T5-based generative models in this context
- Researchers struggle to cope with the volume of biomedical literature and need tools to find relevant articles and validate claims
- Information retrieval approaches like PubMed and Quertle provide a list of relevant articles but do not validate hypotheses or claims
- Extracting named relationships from biomedical literature can contribute to building a large knowledge graph connecting entities through various relationships
- Future work involves creating a comprehensive biomedical knowledge graph for target identification, indication expansion, and drug discovery.
Authors: Nikola Milosevic, Wolfgang Thielemann
Abstract: Biomedical research is growing at such an exponential pace that scientists, researchers, and practitioners are no more able to cope with the amount of published literature in the domain. The knowledge presented in the literature needs to be systematized in such a way that claims and hypotheses can be easily found, accessed, and validated. Knowledge graphs can provide such a framework for semantic knowledge representation from literature. However, in order to build a knowledge graph, it is necessary to extract knowledge as relationships between biomedical entities and normalize both entities and relationship types. In this paper, we present and compare few rule-based and machine learning-based (Naive Bayes, Random Forests as examples of traditional machine learning methods and DistilBERT, PubMedBERT, T5 and SciFive-based models as examples of modern deep learning transformers) methods for scalable relationship extraction from biomedical literature, and for the integration into the knowledge graphs. We examine how resilient are these various methods to unbalanced and fairly small datasets. Our experiments show that transformer-based models handle well both small (due to pre-training on a large dataset) and unbalanced datasets. The best performing model was the PubMedBERT-based model fine-tuned on balanced data, with a reported F1-score of 0.92. DistilBERT-based model followed with F1-score of 0.89, performing faster and with lower resource requirements. BERT-based models performed better then T5-based generative models.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.