ChatGPT-3.5, ChatGPT-4, Google Bard, and Microsoft Bing to Improve Health Literacy and Communication in Pediatric Populations and Beyond

AI-generated keywords: Health literacy Large language models Children Interventions Communication

AI-generated Key Points

  • Enhanced health literacy linked to improved health outcomes
  • Study explores using large language models (LLMs) to enhance health literacy in children and other populations
  • Testing involved 288 conditions with 26 prompts using ChatGPT-3.5, Bing, and Google Bard; subset of 150 conditions further tested with ChatGPT-4 due to rate limits
  • Primary outcome measurements focused on reading grade level (RGL) and word counts of LLM-generated output
  • Basic prompts like "Explain" and "What is (are)" produced outputs at or above a 10th-grade RGL across all models
  • LLMs varied in tailoring responses based on RGL when prompted to explain conditions from the 1st to 12th grade levels
  • ChatGPT-3.5 and ChatGPT-4 performed better at achieving lower-grade level outputs compared to Bing and Bard
  • Bard exhibited hesitancy in providing certain outputs, showing caution in health information dissemination
  • Future research should verify accuracy and effectiveness of LLMs in improving health literacy
  • Despite challenges in crafting outputs below a sixth-grade reading level, LLMs show potential in modifying higher-level outputs for enhanced communication among pediatric populations and beyond
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Kanhai S. Amin, Linda Mayes, Pavan Khosla, Rushabh Doshi

15 pages, 1 Table, 3 Figures, and 3 Supplemental Figures
License: CC BY 4.0

Abstract: Purpose: Enhanced health literacy has been linked to better health outcomes; however, few interventions have been studied. We investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children and other populations. Methods: We ran 288 conditions using 26 different prompts through ChatGPT-3.5, Microsoft Bing, and Google Bard. Given constraints imposed by rate limits, we tested a subset of 150 conditions through ChatGPT-4. The primary outcome measurements were the reading grade level (RGL) and word counts of output. Results: Across all models, output for basic prompts such as "Explain" and "What is (are)" were at, or exceeded, a 10th-grade RGL. When prompts were specified to explain conditions from the 1st to 12th RGL, we found that LLMs had varying abilities to tailor responses based on RGL. ChatGPT-3.5 provided responses that ranged from the 7th-grade to college freshmen RGL while ChatGPT-4 outputted responses from the 6th-grade to the college-senior RGL. Microsoft Bing provided responses from the 9th to 11th RGL while Google Bard provided responses from the 7th to 10th RGL. Discussion: ChatGPT-3.5 and ChatGPT-4 did better in achieving lower-grade level outputs. Meanwhile Bard and Bing tended to consistently produce an RGL that is at the high school level regardless of prompt. Additionally, Bard's hesitancy in providing certain outputs indicates a cautious approach towards health information. LLMs demonstrate promise in enhancing health communication, but future research should verify the accuracy and effectiveness of such tools in this context. Implications: LLMs face challenges in crafting outputs below a sixth-grade reading level. However, their capability to modify outputs above this threshold provides a potential mechanism to improve health literacy and communication in a pediatric population and beyond.

Submitted to arXiv on 16 Nov. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2311.10075v1

Enhanced health literacy has been associated with improved health outcomes. However, there is a lack of interventions studied in this area. This study explores the potential of large language models (LLMs) to enhance health literacy in children and other populations. A total of 288 conditions were tested using 26 different prompts through ChatGPT-3.5, Microsoft Bing, and Google Bard. Due to rate limits, a subset of 150 conditions was further tested through ChatGPT-4. The primary outcome measurements focused on the reading grade level (RGL) and word counts of the output generated by the LLMs. Results showed that basic prompts like "Explain" and "What is (are)" produced outputs at or above a 10th-grade RGL across all models. When prompts specified explaining conditions from the 1st to 12th RGL, LLMs demonstrated varying abilities to tailor responses based on RGL. ChatGPT-3.5 and ChatGPT-4 performed better in achieving lower-grade level outputs compared to Bing and Bard, which tended to consistently produce outputs at a high school level regardless of the prompt. Additionally, Bard exhibited hesitancy in providing certain outputs, indicating a cautious approach towards health information dissemination. While LLMs show promise in enhancing health communication, future research should focus on verifying their accuracy and effectiveness in improving health literacy. Despite challenges in crafting outputs below a sixth-grade reading level, LLMs have shown potential in modifying outputs above this threshold to enhance health literacy and communication among pediatric populations and beyond. The study compiled a comprehensive list of 288 childhood disorders from reputable sources like Johns Hopkins Children's Center and Seattle Children's Hospital for testing purposes. Prompt selection included simple queries as well as context-based prompts tailored for different grade levels to assess the LLMs' ability to communicate complex medical information effectively. To ensure standardized comparison, formatting was removed from outputs, ancillary information was excluded, and readability assessments were conducted using Gunning Fog and Flesch readability formulas. Overall, this study highlights the potential of LLMs in improving health literacy but underscores the need for further research to optimize their use in healthcare communication settings.
Created on 16 Mar. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.