Natural language processing to identify lupus nephritis phenotype in electronic health records

AI-generated keywords: Systemic Lupus Erythematosus

AI-generated Key Points

  • Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by unpredictable flares and remission with diverse manifestations.
  • Lupus nephritis, a major manifestation of SLE, can cause organ damage and mortality.
  • Accurate identification of lupus nephritis is crucial for large cohort observational studies and clinical trials.
  • Procedure codes and structured data in electronic health records (EHRs) can help recognize lupus nephritis, but critical information like histologic reports and medical history narratives require sophisticated text processing.
  • Researchers developed algorithms to identify lupus nephritis using EHR data, with and without natural language processing (NLP).
  • Four algorithms were created: a rule-based algorithm using only structured data as the baseline, and three algorithms utilizing different NLP models.
  • The best performing NLP model showed significant improvement in F measure compared to the baseline algorithm in both datasets used for validation.
  • NLP-based algorithms have the potential to accurately identify lupus nephritis in EHRs.
  • This has important implications for recruitment, study design, analysis in large cohort observational studies and clinical trials focused on SLE.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yu Deng, Jennifer A. Pacheco, Anh Chung, Chengsheng Mao, Joshua C. Smith, Juan Zhao, Wei-Qi Wei, April Barnado, Chunhua Weng, Cong Liu, Adam Cordon, Jingzhi Yu, Yacob Tedla, Abel Kho, Rosalind Ramsey-Goldman, Theresa Walunas, Yuan Luo

License: CC BY 4.0

Abstract: Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data. We developed four algorithms: a rule-based algorithm using only structured data (baseline algorithm) and three algorithms using different NLP models. The three NLP models are based on regularized logistic regression and use different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components respectively. The baseline algorithm and the best performed NLP algorithm were external validated on a dataset from Vanderbilt University Medical Center (VUMC). Our best performing NLP model incorporating features from both structured data, regular expression concepts, and mapped CUIs improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.62 vs 0.96) datasets compared to the baseline lupus nephritis algorithm.

Submitted to arXiv on 20 Dec. 2021

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2112.10821v1

Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by unpredictable flares and remission with diverse manifestations. Lupus nephritis, a major manifestation of SLE, can cause organ damage and mortality, making its accurate identification crucial for large cohort observational studies and clinical trials. While procedure codes and structured data like laboratory tests can help recognize lupus nephritis in electronic health records (EHRs), critical information such as histologic reports from kidney biopsies and medical history narratives require sophisticated text processing. In this study, the researchers developed algorithms to identify lupus nephritis using EHR data, with and without natural language processing (NLP). They created four algorithms: a rule-based algorithm using only structured data as the baseline, and three algorithms utilizing different NLP models. The NLP models were based on regularized logistic regression and incorporated various features like positive mention of concept unique identifiers (CUIs) and the number of appearances of CUIs. To validate their algorithms, the researchers conducted external validation on a dataset from Vanderbilt University Medical Center (VUMC). The best performing NLP model, which incorporated features from both structured data, regular expression concepts, and mapped CUIs, showed significant improvement in F measure compared to the baseline algorithm in both the NMEDW dataset (0.41 vs 0.79) and VUMC dataset (0.62 vs 0.96). The findings highlight the potential of NLP-based algorithms in accurately identifying lupus nephritis in EHRs. This has important implications for recruitment, study design, analysis in large cohort observational studies and clinical trials focused on SLE. By leveraging sophisticated text processing techniques to mine information from pathology reports and medical history narratives, researchers can enhance their understanding of lupus nephritis phenotypes for improved patient characterization.
Created on 30 Sep. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.