ChatGPT-3.5, ChatGPT-4, Google Bard, and Microsoft Bing to Improve Health Literacy and Communication in Pediatric Populations and Beyond

AI-generated keywords: Health literacy Large language models Children Interventions Communication

AI-generated Key Points

Enhanced health literacy linked to improved health outcomes
Study explores using large language models (LLMs) to enhance health literacy in children and other populations
Testing involved 288 conditions with 26 prompts using ChatGPT-3.5, Bing, and Google Bard; subset of 150 conditions further tested with ChatGPT-4 due to rate limits
Primary outcome measurements focused on reading grade level (RGL) and word counts of LLM-generated output
Basic prompts like "Explain" and "What is (are)" produced outputs at or above a 10th-grade RGL across all models
LLMs varied in tailoring responses based on RGL when prompted to explain conditions from the 1st to 12th grade levels
ChatGPT-3.5 and ChatGPT-4 performed better at achieving lower-grade level outputs compared to Bing and Bard
Bard exhibited hesitancy in providing certain outputs, showing caution in health information dissemination
Future research should verify accuracy and effectiveness of LLMs in improving health literacy
Despite challenges in crafting outputs below a sixth-grade reading level, LLMs show potential in modifying higher-level outputs for enhanced communication among pediatric populations and beyond

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Kanhai S. Amin, Linda Mayes, Pavan Khosla, Rushabh Doshi

arXiv: 2311.10075v1 - DOI (cs.CL)

15 pages, 1 Table, 3 Figures, and 3 Supplemental Figures

License: CC BY 4.0

Abstract: Purpose: Enhanced health literacy has been linked to better health outcomes; however, few interventions have been studied. We investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children and other populations. Methods: We ran 288 conditions using 26 different prompts through ChatGPT-3.5, Microsoft Bing, and Google Bard. Given constraints imposed by rate limits, we tested a subset of 150 conditions through ChatGPT-4. The primary outcome measurements were the reading grade level (RGL) and word counts of output. Results: Across all models, output for basic prompts such as "Explain" and "What is (are)" were at, or exceeded, a 10th-grade RGL. When prompts were specified to explain conditions from the 1st to 12th RGL, we found that LLMs had varying abilities to tailor responses based on RGL. ChatGPT-3.5 provided responses that ranged from the 7th-grade to college freshmen RGL while ChatGPT-4 outputted responses from the 6th-grade to the college-senior RGL. Microsoft Bing provided responses from the 9th to 11th RGL while Google Bard provided responses from the 7th to 10th RGL. Discussion: ChatGPT-3.5 and ChatGPT-4 did better in achieving lower-grade level outputs. Meanwhile Bard and Bing tended to consistently produce an RGL that is at the high school level regardless of prompt. Additionally, Bard's hesitancy in providing certain outputs indicates a cautious approach towards health information. LLMs demonstrate promise in enhancing health communication, but future research should verify the accuracy and effectiveness of such tools in this context. Implications: LLMs face challenges in crafting outputs below a sixth-grade reading level. However, their capability to modify outputs above this threshold provides a potential mechanism to improve health literacy and communication in a pediatric population and beyond.

Submitted to arXiv on 16 Nov. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2311.10075v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

Enhanced health literacy has been associated with improved health outcomes. However, there is a lack of interventions studied in this area. This study explores the potential of large language models (LLMs) to enhance health literacy in children and other populations. A total of 288 conditions were tested using 26 different prompts through ChatGPT-3.5, Microsoft Bing, and Google Bard. Due to rate limits, a subset of 150 conditions was further tested through ChatGPT-4. The primary outcome measurements focused on the reading grade level (RGL) and word counts of the output generated by the LLMs. Results showed that basic prompts like "Explain" and "What is (are)" produced outputs at or above a 10th-grade RGL across all models. When prompts specified explaining conditions from the 1st to 12th RGL, LLMs demonstrated varying abilities to tailor responses based on RGL. ChatGPT-3.5 and ChatGPT-4 performed better in achieving lower-grade level outputs compared to Bing and Bard, which tended to consistently produce outputs at a high school level regardless of the prompt. Additionally, Bard exhibited hesitancy in providing certain outputs, indicating a cautious approach towards health information dissemination. While LLMs show promise in enhancing health communication, future research should focus on verifying their accuracy and effectiveness in improving health literacy. Despite challenges in crafting outputs below a sixth-grade reading level, LLMs have shown potential in modifying outputs above this threshold to enhance health literacy and communication among pediatric populations and beyond. The study compiled a comprehensive list of 288 childhood disorders from reputable sources like Johns Hopkins Children's Center and Seattle Children's Hospital for testing purposes. Prompt selection included simple queries as well as context-based prompts tailored for different grade levels to assess the LLMs' ability to communicate complex medical information effectively. To ensure standardized comparison, formatting was removed from outputs, ancillary information was excluded, and readability assessments were conducted using Gunning Fog and Flesch readability formulas. Overall, this study highlights the potential of LLMs in improving health literacy but underscores the need for further research to optimize their use in healthcare communication settings.

- Enhanced health literacy linked to improved health outcomes
- Study explores using large language models (LLMs) to enhance health literacy in children and other populations
- Testing involved 288 conditions with 26 prompts using ChatGPT-3.5, Bing, and Google Bard; subset of 150 conditions further tested with ChatGPT-4 due to rate limits
- Primary outcome measurements focused on reading grade level (RGL) and word counts of LLM-generated output
- Basic prompts like "Explain" and "What is (are)" produced outputs at or above a 10th-grade RGL across all models
- LLMs varied in tailoring responses based on RGL when prompted to explain conditions from the 1st to 12th grade levels
- ChatGPT-3.5 and ChatGPT-4 performed better at achieving lower-grade level outputs compared to Bing and Bard
- Bard exhibited hesitancy in providing certain outputs, showing caution in health information dissemination
- Future research should verify accuracy and effectiveness of LLMs in improving health literacy
- Despite challenges in crafting outputs below a sixth-grade reading level, LLMs show potential in modifying higher-level outputs for enhanced communication among pediatric populations and beyond

Summary- Knowing more about staying healthy helps people stay well. - Scientists are studying how to use big talking computers to help kids and others learn about being healthy. - They tested this by asking the computers questions about 288 different health topics using special programs like ChatGPT-3.5, Bing, and Google Bard. - The main things they looked at were how easy it was to read the computer's answers and how many words they used. - The computers did a good job explaining things in simple words, especially when asked basic questions. Definitions1. Health literacy: Understanding information about staying healthy. 2. Large language models (LLMs): Big talking computers that can help explain things using lots of words. 3. Prompts: Questions or requests for information given to the computers. 4. Reading grade level (RGL): How hard or easy it is to understand written text based on school grade levels. 5. Outputs: Answers or information provided by the computers in response to questions or prompts.

Introduction: Health literacy is a crucial aspect of overall health and well-being. It refers to an individual's ability to obtain, understand, and use health information to make informed decisions about their health (1). Research has shown that individuals with higher levels of health literacy have better health outcomes, including improved disease management and prevention (2). However, there is a lack of interventions studied in this area. This research paper explores the potential of large language models (LLMs) in enhancing health literacy in children and other populations. Methodology: The study used three different LLMs - ChatGPT-3.5, Microsoft Bing, and Google Bard - to generate responses for 288 conditions from reputable sources like Johns Hopkins Children's Center and Seattle Children's Hospital. These conditions were selected based on their prevalence among pediatric populations. The prompts used for generating responses included simple queries as well as context-based prompts tailored for different grade levels. Results: The primary outcome measurements focused on the reading grade level (RGL) and word counts of the output generated by the LLMs. Results showed that basic prompts like "Explain" and "What is (are)" produced outputs at or above a 10th-grade RGL across all models. When prompts specified explaining conditions from the 1st to 12th RGL, LLMs demonstrated varying abilities to tailor responses based on RGL. ChatGPT-3.5 and ChatGPT-4 performed better in achieving lower-grade level outputs compared to Bing and Bard, which tended to consistently produce outputs at a high school level regardless of the prompt. Additionally, Bard exhibited hesitancy in providing certain outputs, indicating a cautious approach towards health information dissemination. Discussion: The results suggest that while LLMs show promise in enhancing health communication, there are limitations that need further investigation. One such limitation is crafting outputs below a sixth-grade reading level; however, LLMs have shown potential in modifying outputs above this threshold to enhance health literacy and communication among pediatric populations and beyond. Future research should focus on verifying the accuracy and effectiveness of LLMs in improving health literacy. This could involve comparing the responses generated by LLMs with those provided by healthcare professionals or conducting studies to assess the impact of using LLMs on individuals' health literacy levels. Conclusion: In conclusion, this study highlights the potential of LLMs in enhancing health literacy but underscores the need for further research to optimize their use in healthcare communication settings. The study's findings provide valuable insights into how different prompts can influence the output generated by LLMs and highlight areas for improvement. With continued advancements in technology, LLMs have the potential to revolutionize healthcare communication and improve health outcomes for individuals across all age groups.

Created on 16 Mar. 2024

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.