Generative Medical Event Models Improve with Scale

AI-generated keywords: Personalized Medicine Longitudinal Patient Journeys Pretrained Foundation Models Epic Cosmos Clinical Decision-Making

AI-generated Key Points

  • Personalized medicine at scale requires distilling insights from longitudinal patient journeys, which are sequences of medical events.
  • Pretrained foundation models on large-scale medical event data, such as the Cosmos Medical Event Transformer (CoMET) models, offer a promising direction for scaling real-world evidence generation and generalizing to diverse downstream tasks.
  • CoMET, pretrained on a dataset representing 118 million patients and 115 billion discrete medical events, can autoregressively generate the next medical event based on a patient's history and outperformed task-specific supervised models across 78 real-world tasks without requiring fine-tuning.
  • The study established a methodology for pretraining transformer models and revealed power-law scaling relationships for compute, tokens, and model size in medical event data.
  • CoMET effectively captures complex clinical dynamics as a generative medical event foundation model, supporting clinical decision-making, streamlining healthcare operations, and enhancing patient outcomes.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Shane Waxler, Paul Blazek, Davis White, Daniel Sneider, Kevin Chung, Mani Nagarathnam, Patrick Williams, Hank Voeller, Karen Wong, Matthew Swanhorst, Sheng Zhang, Naoto Usuyama, Cliff Wong, Tristan Naumann, Hoifung Poon, Andrew Loza, Daniella Meeker, Seth Hain, Rahul Shah

License: CC BY 4.0

Abstract: Realizing personalized medicine at scale calls for methods that distill insights from longitudinal patient journeys, which can be viewed as a sequence of medical events. Foundation models pretrained on large-scale medical event data represent a promising direction for scaling real-world evidence generation and generalizing to diverse downstream tasks. Using Epic Cosmos, a dataset with medical events from de-identified longitudinal health records for 16.3 billion encounters over 300 million unique patient records from 310 health systems, we introduce the Cosmos Medical Event Transformer ( CoMET) models, a family of decoder-only transformer models pretrained on 118 million patients representing 115 billion discrete medical events (151 billion tokens). We present the largest scaling-law study for medical event data, establishing a methodology for pretraining and revealing power-law scaling relationships for compute, tokens, and model size. Based on this, we pretrained a series of compute-optimal models with up to 1 billion parameters. Conditioned on a patient's real-world history, CoMET autoregressively generates the next medical event, simulating patient health timelines. We studied 78 real-world tasks, including diagnosis prediction, disease prognosis, and healthcare operations. Remarkably for a foundation model with generic pretraining and simulation-based inference, CoMET generally outperformed or matched task-specific supervised models on these tasks, without requiring task-specific fine-tuning or few-shot examples. CoMET's predictive power consistently improves as the model and pretraining scale. Our results show that CoMET, a generative medical event foundation model, can effectively capture complex clinical dynamics, providing an extensible and generalizable framework to support clinical decision-making, streamline healthcare operations, and improve patient outcomes.

Submitted to arXiv on 16 Aug. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2508.12104v1

In the pursuit of safe and effective medical care, personalized medicine at scale requires methods that can distill insights from longitudinal patient journeys. These journeys are essentially sequences of medical events. Pretrained foundation models on large-scale medical event data offer a promising direction for scaling real-world evidence generation and generalizing to diverse downstream tasks. Leveraging Epic Cosmos, a dataset comprising de-identified longitudinal health records for over 300 million unique patients and 16.3 billion encounters from 310 health systems, the Cosmos Medical Event Transformer (CoMET) models were introduced. These decoder-only transformer models were pretrained on a massive dataset representing 118 million patients and 115 billion discrete medical events. A comprehensive scaling-law study was conducted for medical event data, establishing a methodology for pretraining and revealing power-law scaling relationships for compute, tokens, and model size. Subsequently, a series of compute-optimal models with up to 1 billion parameters were pretrained. Conditioned on a patient's real-world history, CoMET can autoregressively generate the next medical event, simulating patient health timelines. The study encompassed 78 real-world tasks including diagnosis prediction, disease prognosis, and healthcare operations. Remarkably, CoMET outperformed or matched task-specific supervised models on these tasks without requiring task-specific fine-tuning or few-shot examples. The predictive power of CoMET consistently improved as the model and pretraining scale increased. Results demonstrate that CoMET effectively captures complex clinical dynamics as a generative medical event foundation model. This framework provides an extensible and generalizable approach to support clinical decision-making, streamline healthcare operations, and enhance patient outcomes. Furthermore,<kg>the introduction of Epic Cosmos</kg> has addressed challenges in leveraging real-world data for personalized medicine at scale by aggregating de-identified longitudinal health records across multiple health systems. The platform unifies various clinical data types to support patient care and accelerate scientific discovery while delivering actionable insights to clinicians at the point of care through features like Cosmos Median Length of Stay and Best Care Choices for My Patient™. Despite the vast potential of Cosmos data in informing healthcare decisions and research priorities such as understanding trends in healthcare utilization or investigating rare diseases, answering specific clinical questions still requires manual effort in crafting custom cohort definitions and feature-engineering pipelines. To enable routine clinical decision-making with personalized medicine at scale using RWE demands tools that can learn from integrated patient records efficiently answer complex medical inquiries across diverse contexts.
Created on 22 Aug. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.