The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

AI-generated keywords: Computational Science

AI-generated Key Points

  • Simulation plays a crucial role in predicting, optimizing, and inferring parameters for physical systems
  • Numerical methods are widely used due to the impracticality of finding analytical solutions for complex partial differential equations
  • Surrogate models have emerged as effective alternatives to numerical methods by capturing essential system features
  • Deep learning enhances surrogate modeling by providing faster and more accurate results across various fields
  • The Well dataset collection offers high-quality simulation data across diverse domains to drive innovation in developing efficient and accurate surrogate models
  • The Well serves as a valuable resource for machine learning researchers seeking challenging benchmarks to develop advanced data-driven surrogates
  • The Well provides 15 TB datasets covering various physical systems, each including temporally coarsened snapshots from simulations across different initial conditions or parameters
  • A unified PyTorch interface is offered by the Well for training and evaluating models, facilitating easy access and utilization of the data within machine learning workflows
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Ruben Ohana, Michael McCabe, Lucas Meyer, Rudy Morel, Fruzsina J. Agocs, Miguel Beneitez, Marsha Berger, Blakesley Burkhart, Stuart B. Dalziel, Drummond B. Fielding, Daniel Fortunato, Jared A. Goldberg, Keiya Hirashima, Yan-Fei Jiang, Rich R. Kerswell, Suryanarayana Maddu, Jonah Miller, Payel Mukhopadhyay, Stefan S. Nixon, Jeff Shen, Romain Watteaux, Bruno Régaldo-Saint Blancard, François Rozet, Liam H. Parker, Miles Cranmer, Shirley Ho

38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks
License: CC BY 4.0

Abstract: Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containing numerical simulations of a wide variety of spatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such as biological systems, fluid dynamics, acoustic scattering, as well as magneto-hydrodynamic simulations of extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broader benchmark suite. To facilitate usage of the Well, we provide a unified PyTorch interface for training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https://github.com/PolymathicAI/the_well.

Submitted to arXiv on 30 Nov. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2412.00568v1

, , , , In the realm of computational science and engineering, simulation plays a crucial role in predicting, optimizing, and inferring parameters for physical systems. However, finding analytical solutions for the complex partial differential equations that govern many phenomena is often impractical, leading to the widespread use of numerical methods. While these methods offer accuracy, they can be computationally expensive. To address this challenge, surrogate models have emerged as simplified yet effective alternatives that capture essential features of a system. Deep learning has shown promise in enhancing surrogate modeling by providing faster and more accurate results across various fields. Despite these advancements, the adoption of deep learning-based surrogates faces obstacles due to the mismatch between the complexity of real-world problems and available datasets. <break> <break> <break> The introduction of example baselines showcases the unique challenges posed by the dynamic nature of the Well's datasets. Overall, the Well represents a significant step towards advancing machine learning-based surrogate modeling by offering researchers access to high-quality simulation data across various domains. The availability of this extensive dataset collection is expected to drive innovation in developing more efficient and accurate surrogate models for complex physical systems. <break> <break> <break> The Well serves as a valuable resource for machine learning researchers seeking challenging benchmarks to develop advanced data-driven surrogates. By providing complex tasks at a manageable scale for modern machine learning techniques, the Well aims to facilitate the development of next-generation surrogate models that balance efficiency with accuracy. <break> <break> <break> To bridge this gap, the Well is introduced as a comprehensive collection of 15 TB datasets derived from collaborations with domain experts and software developers. These datasets cover a wide range of physical systems such as biological processes, fluid dynamics, acoustic scattering, and astrophysical phenomena like supernova explosions. Each dataset includes temporally coarsened snapshots from simulations across different initial conditions or parameters to explore stability. <break> <break> <break> In addition to offering diverse datasets, the Well also provides a unified PyTorch interface for training and evaluating models. This interface enables researchers to easily access and utilize the data within their machine learning workflows.
Created on 18 May. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.