Healthsheet: Development of a Transparency Artifact for Health Datasets

AI-generated keywords: Machine learning

AI-generated Key Points

  • Machine learning (ML) has shown great potential in healthcare applications
  • Ethical concerns stem from structural inequalities in data collection, usage, and handling
  • Developing guidelines for creating, using, and maintaining ML healthcare datasets is essential
  • Healthsheet introduced as a contextualized adaptation of the datasheet questionnaire for health-related applications
  • Contextualizing datasheets for healthcare is necessary
  • Inconsistency in broader use of accountability practices like datasheets within the ML for health community
  • Lack of incentives for creating datasheets but long-term benefits recognized
  • Suggestions to change incentives to prioritize data sharing over publications and standardize reporting regulations
  • Healthsheets facilitate efficient dataset usage and long-term follow-up through detailed documentation
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Negar Rostamzadeh, Diana Mincu, Subhrajit Roy, Andrew Smart, Lauren Wilcox, Mahima Pushkarna, Jessica Schrouff, Razvan Amironesei, Nyalleng Moorosi, Katherine Heller

License: CC BY 4.0

Abstract: Machine learning (ML) approaches have demonstrated promising results in a wide range of healthcare applications. Data plays a crucial role in developing ML-based healthcare systems that directly affect people's lives. Many of the ethical issues surrounding the use of ML in healthcare stem from structural inequalities underlying the way we collect, use, and handle data. Developing guidelines to improve documentation practices regarding the creation, use, and maintenance of ML healthcare datasets is therefore of critical importance. In this work, we introduce Healthsheet, a contextualized adaptation of the original datasheet questionnaire ~\cite{gebru2018datasheets} for health-specific applications. Through a series of semi-structured interviews, we adapt the datasheets for healthcare data documentation. As part of the Healthsheet development process and to understand the obstacles researchers face in creating datasheets, we worked with three publicly-available healthcare datasets as our case studies, each with different types of structured data: Electronic health Records (EHR), clinical trial study data, and smartphone-based performance outcome measures. Our findings from the interviewee study and case studies show 1) that datasheets should be contextualized for healthcare, 2) that despite incentives to adopt accountability practices such as datasheets, there is a lack of consistency in the broader use of these practices 3) how the ML for health community views datasheets and particularly \textit{Healthsheets} as diagnostic tool to surface the limitations and strength of datasets and 4) the relative importance of different fields in the datasheet to healthcare concerns.

Submitted to arXiv on 26 Feb. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2202.13028v1

, , , , Machine learning (ML) has shown great potential in various healthcare applications, with data playing a crucial role in the development of ML-based systems. However, ethical concerns surrounding its use often stem from structural inequalities in data collection, usage, and handling. To address this issue, developing guidelines for creating, using, and maintaining ML healthcare datasets is essential. In this study, Healthsheet was introduced as a contextualized adaptation of the datasheet questionnaire specifically designed for health-related applications. Through semi-structured interviews and case studies on publicly-available healthcare datasets such as Electronic Health Records (EHR) and clinical trial study data, it was found that contextualizing datasheets for healthcare is necessary. Despite incentives to adopt accountability practices like datasheets, there is inconsistency in their broader use within the ML for health community. Participants acknowledged the lack of incentives for creating datasheets but recognized the long-term benefits of reduced overhead from investing in documentation upfront. Suggestions were made to change incentives within the community to prioritize data sharing over publications and standardize reporting regulations. Overall, Healthsheets serve as a valuable transparency artifact for health datasets by facilitating efficient dataset usage and long-term follow-up through detailed documentation that aids researchers in understanding data origins and potential usages while securely storing it.
Created on 25 Mar. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.