An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

AI-generated keywords: Rare diseases

AI-generated Key Points

Rare diseases impact over 300 million individuals globally, posing challenges in diagnosis due to clinical heterogeneity and low prevalence rates.
DeepRare is the first rare disease diagnosis system powered by a large language model (LLM) designed to process diverse clinical inputs.
The system generates ranked diagnostic hypotheses supported by transparent reasoning chains for collaboration between healthcare professionals and AI systems.
DeepRare comprises a central host with long-term memory and specialized agent servers integrating over 40 tools and medical knowledge sources.
Extensive evaluations show exceptional diagnostic performance, achieving 100% accuracy for 1013 diseases and outperforming traditional tools with an average Recall@1 score of 57.18%.
Manual verification confirms validity, traceability, and potential as a reliable decision support tool in rare disease diagnostics.
The user-friendly web application implementation enhances accessibility for healthcare professionals seeking advanced diagnostic capabilities.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Weike Zhao, Chaoyi Wu, Yanjie Fan, Xiaoman Zhang, Pengcheng Qiu, Yuze Sun, Xiao Zhou, Yanfeng Wang, Ya Zhang, Yongguo Yu, Kun Sun, Weidi Xie

arXiv: 2506.20430v1 - DOI (cs.CL)

License: CC BY-NC-SA 4.0

Abstract: Rare diseases collectively affect over 300 million individuals worldwide, yet timely and accurate diagnosis remains a pervasive challenge. This is largely due to their clinical heterogeneity, low individual prevalence, and the limited familiarity most clinicians have with rare conditions. Here, we introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM), capable of processing heterogeneous clinical inputs. The system generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning that links intermediate analytic steps to verifiable medical evidence. DeepRare comprises three key components: a central host with a long-term memory module; specialized agent servers responsible for domain-specific analytical tasks integrating over 40 specialized tools and web-scale, up-to-date medical knowledge sources, ensuring access to the most current clinical information. This modular and scalable design enables complex diagnostic reasoning while maintaining traceability and adaptability. We evaluate DeepRare on eight datasets. The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases. In HPO-based evaluations, DeepRare significantly outperforms other 15 methods, like traditional bioinformatics diagnostic tools, LLMs, and other agentic systems, achieving an average Recall@1 score of 57.18% and surpassing the second-best method (Reasoning LLM) by a substantial margin of 23.79 percentage points. For multi-modal input scenarios, DeepRare achieves 70.60% at Recall@1 compared to Exomiser's 53.20% in 109 cases. Manual verification of reasoning chains by clinical experts achieves 95.40% agreements. Furthermore, the DeepRare system has been implemented as a user-friendly web application http://raredx.cn/doctor.

Submitted to arXiv on 25 Jun. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2506.20430v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

, , , , Rare diseases collectively impact over 300 million individuals globally, presenting a significant challenge in terms of timely and accurate diagnosis due to their clinical heterogeneity, low prevalence rates, and limited familiarity among clinicians. To address this issue, the groundbreaking DeepRare system has been introduced as the first rare disease diagnosis agentic system powered by a large language model (LLM). This innovative system is designed to process diverse clinical inputs including free-text descriptions, structured Human Phenotype Ontology (HPO) terms, and genetic testing results in variant call format (VCF). DeepRare generates ranked diagnostic hypotheses for rare diseases, each supported by a transparent chain of reasoning that connects intermediate analytic steps to verifiable medical evidence. This emphasis on interpretability is crucial for facilitating collaboration between healthcare professionals and AI systems in diagnostic workflows. The system comprises three key components: a central host with a long-term memory module and specialized agent servers responsible for domain-specific analytical tasks. These agent servers integrate over 40 specialized tools and web-scale medical knowledge sources to ensure access to the most current clinical information. The modular and scalable design of DeepRare enables complex diagnostic reasoning while maintaining traceability and adaptability. Extensive evaluations conducted on eight datasets from various regions and medical specialties demonstrate the exceptional diagnostic performance of DeepRare among 2,919 diseases, achieving 100% accuracy for 1013 diseases. In HPO-based evaluations, DeepRare outperforms traditional bioinformatics tools, large language models, and other agentic systems with an average Recall@1 score of 57.18%, surpassing the second-best method by a substantial margin. For multi-modal input scenarios, DeepRare achieves a Recall@1 rate of 70.60%, demonstrating superior performance compared to existing tools. Manual verification of reasoning chains by clinical experts confirms the validity and traceability of the system's intermediate steps. This validation underscores DeepRare's potential as a reliable decision support tool in rare disease diagnostics. Furthermore, the user-friendly web application implementation of DeepRare enhances accessibility for healthcare professionals seeking advanced diagnostic capabilities. Overall, DeepRare represents a significant advancement in rare disease diagnosis technology, offering unparalleled accuracy and transparency in complex diagnostic processes.

- Rare diseases impact over 300 million individuals globally, posing challenges in diagnosis due to clinical heterogeneity and low prevalence rates.
- DeepRare is the first rare disease diagnosis system powered by a large language model (LLM) designed to process diverse clinical inputs.
- The system generates ranked diagnostic hypotheses supported by transparent reasoning chains for collaboration between healthcare professionals and AI systems.
- DeepRare comprises a central host with long-term memory and specialized agent servers integrating over 40 tools and medical knowledge sources.
- Extensive evaluations show exceptional diagnostic performance, achieving 100% accuracy for 1013 diseases and outperforming traditional tools with an average Recall@1 score of 57.18%.
- Manual verification confirms validity, traceability, and potential as a reliable decision support tool in rare disease diagnostics.
- The user-friendly web application implementation enhances accessibility for healthcare professionals seeking advanced diagnostic capabilities.

Summary- Rare diseases are illnesses that affect a small number of people around the world and can be difficult to diagnose because they are not common. - DeepRare is a special computer system that helps doctors figure out what rare disease a person might have by using a big language model. - The system gives doctors a list of possible diseases in order of likelihood, along with explanations for why each disease is suggested. - DeepRare has a main computer that remembers things for a long time and other smaller computers that use many tools and medical information sources to help make diagnoses. - Tests have shown that DeepRare is very good at diagnosing diseases, getting all the answers right for over 1000 diseases and performing better than older tools. Definitions- Rare diseases: Illnesses that only affect a small number of people. - Diagnosis: Figuring out what illness or disease someone has based on their symptoms and test results. - Language model (LLM): A type of computer program that understands and processes human language. - Hypotheses: Ideas or guesses about something based on available information. - Recall@1 score: A measure of how often the correct answer is given as the first choice in a list of possibilities.

Introduction

Rare diseases, also known as orphan diseases, are a group of disorders that affect a small percentage of the population. Despite their low prevalence rates, rare diseases collectively impact over 300 million individuals globally. These conditions present a significant challenge in terms of timely and accurate diagnosis due to their clinical heterogeneity and limited familiarity among clinicians. However, with the rapid advancement of technology and artificial intelligence (AI), there is hope for improving rare disease diagnostics. In this blog article, we will discuss an innovative system called DeepRare that has been introduced to address the challenges in diagnosing rare diseases. This system utilizes a large language model (LLM) and specialized agent servers to generate ranked diagnostic hypotheses for rare diseases, each supported by transparent reasoning chains. We will explore the components and evaluations of DeepRare and its potential impact on rare disease diagnostics.

The Need for Improved Rare Disease Diagnostics

Diagnosing rare diseases is often a complex process that involves multiple steps such as gathering patient history, performing physical exams, conducting laboratory tests, and consulting with specialists. Due to their low prevalence rates and diverse symptoms, it can take years or even decades for patients to receive an accurate diagnosis. This delay not only causes frustration but also leads to unnecessary treatments or incorrect diagnoses. Moreover, healthcare professionals may not have enough knowledge or experience with these conditions since they are so uncommon. This lack of familiarity can further hinder the diagnostic process and result in misdiagnosis or delayed treatment.

The Introduction of DeepRare

To address these challenges in diagnosing rare diseases, researchers have developed DeepRare – the first rare disease diagnosis agentic system powered by a large language model (LLM). This groundbreaking system is designed to process diverse clinical inputs including free-text descriptions, structured Human Phenotype Ontology (HPO) terms, and genetic testing results in variant call format (VCF). DeepRare comprises three key components: a central host with a long-term memory module and specialized agent servers responsible for domain-specific analytical tasks. These agent servers integrate over 40 specialized tools and web-scale medical knowledge sources to ensure access to the most current clinical information.

Emphasis on Interpretability

One of the key features of DeepRare is its emphasis on interpretability. This means that the system provides transparent reasoning chains that connect intermediate analytic steps to verifiable medical evidence. This feature is crucial for facilitating collaboration between healthcare professionals and AI systems in diagnostic workflows, as it allows clinicians to understand how the system arrived at its diagnostic hypotheses. The modular and scalable design of DeepRare also enables complex diagnostic reasoning while maintaining traceability and adaptability. This means that as new information or tools become available, they can easily be integrated into the system without compromising its performance.

Evaluations of DeepRare

Extensive evaluations have been conducted on eight datasets from various regions and medical specialties to test the performance of DeepRare. These evaluations demonstrate the exceptional diagnostic accuracy of DeepRare among 2,919 diseases, achieving 100% accuracy for 1013 diseases. In HPO-based evaluations, DeepRare outperforms traditional bioinformatics tools, large language models, and other agentic systems with an average Recall@1 score of 57.18%. This surpasses the second-best method by a substantial margin. For multi-modal input scenarios where both free-text descriptions and structured HPO terms are provided, DeepRare achieves a Recall@1 rate of 70.60%, demonstrating superior performance compared to existing tools. Furthermore, manual verification of reasoning chains by clinical experts confirms the validity and traceability of the system's intermediate steps. This validation underscores DeepRare's potential as a reliable decision support tool in rare disease diagnostics.

User-Friendly Web Application Implementation

DeepRare has also been implemented as a user-friendly web application, making it easily accessible for healthcare professionals seeking advanced diagnostic capabilities. This implementation allows clinicians to input patient data and receive ranked diagnostic hypotheses in a matter of seconds.

Conclusion

In conclusion, DeepRare represents a significant advancement in rare disease diagnosis technology. Its use of a large language model and specialized agent servers enables accurate and transparent diagnostic reasoning, while its modular design ensures adaptability to new information and tools. Extensive evaluations have demonstrated the exceptional performance of DeepRare, making it a valuable tool for healthcare professionals in diagnosing rare diseases. With its user-friendly web application implementation, DeepRare has the potential to improve the lives of millions affected by rare diseases worldwide.

Created on 18 Jul. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

49.9%

ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Doma…

cs.CL

49.2%

RARE: Retrieval-Augmented Reasoning Modeling

cs.CL

48.8%

Towards Generalist Biomedical AI

cs.CL

47.2%

ChatGPT-3.5, ChatGPT-4, Google Bard, and Microsoft Bing to Improve Health Lit…

cs.CL

46.0%

Towards Expert-Level Medical Question Answering with Large Language Models

cs.CL

45.3%

DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual C…

cs.CL

45.2%

Automated Clinical Coding: What, Why, and Where We Are?

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.