Generative Medical Event Models Improve with Scale

AI-generated keywords: Personalized Medicine Longitudinal Patient Journeys Pretrained Foundation Models Epic Cosmos Clinical Decision-Making

AI-generated Key Points

Personalized medicine at scale requires distilling insights from longitudinal patient journeys, which are sequences of medical events.
Pretrained foundation models on large-scale medical event data, such as the Cosmos Medical Event Transformer (CoMET) models, offer a promising direction for scaling real-world evidence generation and generalizing to diverse downstream tasks.
CoMET, pretrained on a dataset representing 118 million patients and 115 billion discrete medical events, can autoregressively generate the next medical event based on a patient's history and outperformed task-specific supervised models across 78 real-world tasks without requiring fine-tuning.
The study established a methodology for pretraining transformer models and revealed power-law scaling relationships for compute, tokens, and model size in medical event data.
CoMET effectively captures complex clinical dynamics as a generative medical event foundation model, supporting clinical decision-making, streamlining healthcare operations, and enhancing patient outcomes.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Shane Waxler, Paul Blazek, Davis White, Daniel Sneider, Kevin Chung, Mani Nagarathnam, Patrick Williams, Hank Voeller, Karen Wong, Matthew Swanhorst, Sheng Zhang, Naoto Usuyama, Cliff Wong, Tristan Naumann, Hoifung Poon, Andrew Loza, Daniella Meeker, Seth Hain, Rahul Shah

arXiv: 2508.12104v1 - DOI (cs.LG)

License: CC BY 4.0

Abstract: Realizing personalized medicine at scale calls for methods that distill insights from longitudinal patient journeys, which can be viewed as a sequence of medical events. Foundation models pretrained on large-scale medical event data represent a promising direction for scaling real-world evidence generation and generalizing to diverse downstream tasks. Using Epic Cosmos, a dataset with medical events from de-identified longitudinal health records for 16.3 billion encounters over 300 million unique patient records from 310 health systems, we introduce the Cosmos Medical Event Transformer ( CoMET) models, a family of decoder-only transformer models pretrained on 118 million patients representing 115 billion discrete medical events (151 billion tokens). We present the largest scaling-law study for medical event data, establishing a methodology for pretraining and revealing power-law scaling relationships for compute, tokens, and model size. Based on this, we pretrained a series of compute-optimal models with up to 1 billion parameters. Conditioned on a patient's real-world history, CoMET autoregressively generates the next medical event, simulating patient health timelines. We studied 78 real-world tasks, including diagnosis prediction, disease prognosis, and healthcare operations. Remarkably for a foundation model with generic pretraining and simulation-based inference, CoMET generally outperformed or matched task-specific supervised models on these tasks, without requiring task-specific fine-tuning or few-shot examples. CoMET's predictive power consistently improves as the model and pretraining scale. Our results show that CoMET, a generative medical event foundation model, can effectively capture complex clinical dynamics, providing an extensible and generalizable framework to support clinical decision-making, streamline healthcare operations, and improve patient outcomes.

Submitted to arXiv on 16 Aug. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2508.12104v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

In the pursuit of safe and effective medical care, personalized medicine at scale requires methods that can distill insights from longitudinal patient journeys. These journeys are essentially sequences of medical events. Pretrained foundation models on large-scale medical event data offer a promising direction for scaling real-world evidence generation and generalizing to diverse downstream tasks. Leveraging Epic Cosmos, a dataset comprising de-identified longitudinal health records for over 300 million unique patients and 16.3 billion encounters from 310 health systems, the Cosmos Medical Event Transformer (CoMET) models were introduced. These decoder-only transformer models were pretrained on a massive dataset representing 118 million patients and 115 billion discrete medical events. A comprehensive scaling-law study was conducted for medical event data, establishing a methodology for pretraining and revealing power-law scaling relationships for compute, tokens, and model size. Subsequently, a series of compute-optimal models with up to 1 billion parameters were pretrained. Conditioned on a patient's real-world history, CoMET can autoregressively generate the next medical event, simulating patient health timelines. The study encompassed 78 real-world tasks including diagnosis prediction, disease prognosis, and healthcare operations. Remarkably, CoMET outperformed or matched task-specific supervised models on these tasks without requiring task-specific fine-tuning or few-shot examples. The predictive power of CoMET consistently improved as the model and pretraining scale increased. Results demonstrate that CoMET effectively captures complex clinical dynamics as a generative medical event foundation model. This framework provides an extensible and generalizable approach to support clinical decision-making, streamline healthcare operations, and enhance patient outcomes. Furthermore,<kg>the introduction of Epic Cosmos</kg> has addressed challenges in leveraging real-world data for personalized medicine at scale by aggregating de-identified longitudinal health records across multiple health systems. The platform unifies various clinical data types to support patient care and accelerate scientific discovery while delivering actionable insights to clinicians at the point of care through features like Cosmos Median Length of Stay and Best Care Choices for My Patient™. Despite the vast potential of Cosmos data in informing healthcare decisions and research priorities such as understanding trends in healthcare utilization or investigating rare diseases, answering specific clinical questions still requires manual effort in crafting custom cohort definitions and feature-engineering pipelines. To enable routine clinical decision-making with personalized medicine at scale using RWE demands tools that can learn from integrated patient records efficiently answer complex medical inquiries across diverse contexts.

- Personalized medicine at scale requires distilling insights from longitudinal patient journeys, which are sequences of medical events.
- Pretrained foundation models on large-scale medical event data, such as the Cosmos Medical Event Transformer (CoMET) models, offer a promising direction for scaling real-world evidence generation and generalizing to diverse downstream tasks.
- CoMET, pretrained on a dataset representing 118 million patients and 115 billion discrete medical events, can autoregressively generate the next medical event based on a patient's history and outperformed task-specific supervised models across 78 real-world tasks without requiring fine-tuning.
- The study established a methodology for pretraining transformer models and revealed power-law scaling relationships for compute, tokens, and model size in medical event data.
- CoMET effectively captures complex clinical dynamics as a generative medical event foundation model, supporting clinical decision-making, streamlining healthcare operations, and enhancing patient outcomes.

Summary- Personalized medicine means using information from a person's medical history to help them get better. - Scientists have created a special computer program called CoMET that can predict what might happen next in someone's medical journey. - CoMET was trained on data from millions of patients and billions of medical events, and it can make predictions without needing extra training. - The study showed how to train these computer models and found patterns in how they work with medical data. - CoMET helps doctors make better decisions, run hospitals more smoothly, and improve patient health. Definitions- Personalized medicine: Tailoring medical treatment to individual characteristics of each patient. - Longitudinal: Relating to data collected over a long period of time. - Pretrained: A model that has been trained on a large dataset before being used for specific tasks. - Autoregressively: Making predictions based on previous events in a sequence. - Generative: Capable of producing new content or information.

Introduction

In the world of healthcare, personalized medicine has emerged as a promising approach to providing safe and effective medical care. This method involves tailoring treatments and interventions to individual patients based on their unique characteristics, such as genetics, lifestyle, and medical history. However, implementing personalized medicine at scale presents challenges in distilling insights from longitudinal patient journeys – essentially sequences of medical events. To address this issue, researchers have turned to pretrained foundation models on large-scale medical event data. These models offer a promising direction for scaling real-world evidence (RWE) generation and generalizing to diverse downstream tasks. In this article, we will delve into a recent research paper that introduces the Cosmos Medical Event Transformer (CoMET) models – decoder-only transformer models pretrained on a massive dataset representing 118 million patients and 115 billion discrete medical events.

The Dataset: Epic Cosmos

The CoMET models were trained using Epic Cosmos – a dataset comprising de-identified longitudinal health records for over 300 million unique patients and 16.3 billion encounters from 310 health systems. This platform addresses challenges in leveraging real-world data for personalized medicine at scale by aggregating de-identified longitudinal health records across multiple health systems. Epic Cosmos unifies various clinical data types to support patient care and accelerate scientific discovery while delivering actionable insights to clinicians at the point of care through features like Cosmos Median Length of Stay and Best Care Choices for My Patient™. It provides an extensible and generalizable approach to support clinical decision-making, streamline healthcare operations, and enhance patient outcomes. Despite its vast potential in informing healthcare decisions and research priorities such as understanding trends in healthcare utilization or investigating rare diseases, answering specific clinical questions still requires manual effort in crafting custom cohort definitions and feature-engineering pipelines.

The Study: Pretraining CoMET Models

The study conducted by the researchers involved pretraining CoMET models on a massive dataset representing 118 million patients and 115 billion discrete medical events. A comprehensive scaling-law study was also conducted for medical event data, establishing a methodology for pretraining and revealing power-law scaling relationships for compute, tokens, and model size. Subsequently, a series of compute-optimal models with up to 1 billion parameters were pretrained. Conditioned on a patient's real-world history, CoMET can autoregressively generate the next medical event, simulating patient health timelines.

Results

The study encompassed 78 real-world tasks including diagnosis prediction, disease prognosis, and healthcare operations. Remarkably, CoMET outperformed or matched task-specific supervised models on these tasks without requiring task-specific fine-tuning or few-shot examples. The predictive power of CoMET consistently improved as the model and pretraining scale increased. These results demonstrate that CoMET effectively captures complex clinical dynamics as a generative medical event foundation model. This framework provides an extensible and generalizable approach to support clinical decision-making, streamline healthcare operations, and enhance patient outcomes.

Conclusion

In conclusion,the introduction of Epic Cosmos has addressed challenges in leveraging real-world data for personalized medicine at scale by aggregating de-identified longitudinal health records across multiple health systems. The use of pretrained foundation models like CoMET offers a promising direction for scaling RWE generation and generalizing to diverse downstream tasks. This research paper highlights the potential of using large-scale medical event data in improving healthcare outcomes through personalized medicine at scale. With further advancements in technology and access to comprehensive datasets like Epic Cosmos, we can expect significant progress in this field in the future – ultimately leading to better patient care worldwide.

Created on 22 Aug. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

55.4%

SPOT: Sequential Predictive Modeling of Clinical Trial Outcome with Meta-Lear…

cs.LG

54.0%

Closed-form Continuous-Depth Models

cs.LG

52.8%

MediTab: Scaling Medical Tabular Data Predictors via Data Consolidation, Enri…

cs.LG

52.4%

Common human diseases prediction using machine learning based on survey data

cs.LG

50.2%

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

cs.LG

49.4%

Natural language processing to identify lupus nephritis phenotype in electron…

cs.LG

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.