An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

AI-generated keywords: Rare diseases

AI-generated Key Points

  • Rare diseases impact over 300 million individuals globally, posing challenges in diagnosis due to clinical heterogeneity and low prevalence rates.
  • DeepRare is the first rare disease diagnosis system powered by a large language model (LLM) designed to process diverse clinical inputs.
  • The system generates ranked diagnostic hypotheses supported by transparent reasoning chains for collaboration between healthcare professionals and AI systems.
  • DeepRare comprises a central host with long-term memory and specialized agent servers integrating over 40 tools and medical knowledge sources.
  • Extensive evaluations show exceptional diagnostic performance, achieving 100% accuracy for 1013 diseases and outperforming traditional tools with an average Recall@1 score of 57.18%.
  • Manual verification confirms validity, traceability, and potential as a reliable decision support tool in rare disease diagnostics.
  • The user-friendly web application implementation enhances accessibility for healthcare professionals seeking advanced diagnostic capabilities.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Weike Zhao, Chaoyi Wu, Yanjie Fan, Xiaoman Zhang, Pengcheng Qiu, Yuze Sun, Xiao Zhou, Yanfeng Wang, Ya Zhang, Yongguo Yu, Kun Sun, Weidi Xie

License: CC BY-NC-SA 4.0

Abstract: Rare diseases collectively affect over 300 million individuals worldwide, yet timely and accurate diagnosis remains a pervasive challenge. This is largely due to their clinical heterogeneity, low individual prevalence, and the limited familiarity most clinicians have with rare conditions. Here, we introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM), capable of processing heterogeneous clinical inputs. The system generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning that links intermediate analytic steps to verifiable medical evidence. DeepRare comprises three key components: a central host with a long-term memory module; specialized agent servers responsible for domain-specific analytical tasks integrating over 40 specialized tools and web-scale, up-to-date medical knowledge sources, ensuring access to the most current clinical information. This modular and scalable design enables complex diagnostic reasoning while maintaining traceability and adaptability. We evaluate DeepRare on eight datasets. The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases. In HPO-based evaluations, DeepRare significantly outperforms other 15 methods, like traditional bioinformatics diagnostic tools, LLMs, and other agentic systems, achieving an average Recall@1 score of 57.18% and surpassing the second-best method (Reasoning LLM) by a substantial margin of 23.79 percentage points. For multi-modal input scenarios, DeepRare achieves 70.60% at Recall@1 compared to Exomiser's 53.20% in 109 cases. Manual verification of reasoning chains by clinical experts achieves 95.40% agreements. Furthermore, the DeepRare system has been implemented as a user-friendly web application http://raredx.cn/doctor.

Submitted to arXiv on 25 Jun. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2506.20430v1

, , , , Rare diseases collectively impact over 300 million individuals globally, presenting a significant challenge in terms of timely and accurate diagnosis due to their clinical heterogeneity, low prevalence rates, and limited familiarity among clinicians. To address this issue, the groundbreaking DeepRare system has been introduced as the first rare disease diagnosis agentic system powered by a large language model (LLM). This innovative system is designed to process diverse clinical inputs including free-text descriptions, structured Human Phenotype Ontology (HPO) terms, and genetic testing results in variant call format (VCF). DeepRare generates ranked diagnostic hypotheses for rare diseases, each supported by a transparent chain of reasoning that connects intermediate analytic steps to verifiable medical evidence. This emphasis on interpretability is crucial for facilitating collaboration between healthcare professionals and AI systems in diagnostic workflows. The system comprises three key components: a central host with a long-term memory module and specialized agent servers responsible for domain-specific analytical tasks. These agent servers integrate over 40 specialized tools and web-scale medical knowledge sources to ensure access to the most current clinical information. The modular and scalable design of DeepRare enables complex diagnostic reasoning while maintaining traceability and adaptability. Extensive evaluations conducted on eight datasets from various regions and medical specialties demonstrate the exceptional diagnostic performance of DeepRare among 2,919 diseases, achieving 100% accuracy for 1013 diseases. In HPO-based evaluations, DeepRare outperforms traditional bioinformatics tools, large language models, and other agentic systems with an average Recall@1 score of 57.18%, surpassing the second-best method by a substantial margin. For multi-modal input scenarios, DeepRare achieves a Recall@1 rate of 70.60%, demonstrating superior performance compared to existing tools. Manual verification of reasoning chains by clinical experts confirms the validity and traceability of the system's intermediate steps. This validation underscores DeepRare's potential as a reliable decision support tool in rare disease diagnostics. Furthermore, the user-friendly web application implementation of DeepRare enhances accessibility for healthcare professionals seeking advanced diagnostic capabilities. Overall, DeepRare represents a significant advancement in rare disease diagnosis technology, offering unparalleled accuracy and transparency in complex diagnostic processes.
Created on 18 Jul. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.