Masked Autoencoders are Scalable Learners of Cellular Morphology

AI-generated keywords: Biological research

AI-generated Key Points

Deep vision models are more effective at capturing biological signals compared to hand-crafted features
Convolutional Neural Network (CNN) and Vision Transformer (ViT) based masked autoencoders outperform weakly supervised baselines
ViT-L/8 model trained on over 3.5 billion unique crops achieved relative improvements of up to 28% in inferring known biological relationships
Researchers adapted U-Nets for masked autoencoding (MU-Nets) with promising outcomes
Study includes detailed figures illustrating StringDB recall and training FLOps, as well as recall across different cosine similarity percentiles for each database

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, Maciej Sypetkowski, Chi Vicky Cheng, Kristen Morse, Maureen Makes, Ben Mabey, Berton Earnshaw

arXiv: 2309.16064v2 - DOI (cs.CV)

Spotlight at NeurIPS 2023 Generative AI and Biology (GenBio) Workshop

License: CC BY-NC-SA 4.0

Abstract: Inferring biological relationships from cellular phenotypes in high-content microscopy screens provides significant opportunity and challenge in biological research. Prior results have shown that deep vision models can capture biological signal better than hand-crafted features. This work explores how self-supervised deep learning approaches scale when training larger models on larger microscopy datasets. Our results show that both CNN- and ViT-based masked autoencoders significantly outperform weakly supervised baselines. At the high-end of our scale, a ViT-L/8 trained on over 3.5-billion unique crops sampled from 93-million microscopy images achieves relative improvements as high as 28% over our best weakly supervised baseline at inferring known biological relationships curated from public databases. Relevant code and select models released with this work can be found at: https://github.com/recursionpharma/maes_microscopy.

Submitted to arXiv on 27 Sep. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2309.16064v2

Comprehensive Summary
Key points
Layman's Summary
Blog article

, , , , In the field of biological research, inferring biological relationships from cellular phenotypes in high-content microscopy screens presents both significant opportunities and challenges. Previous studies have demonstrated that deep vision models are more effective at capturing biological signals compared to hand-crafted features. This study delves into the scalability of self-supervised deep learning approaches when training larger models on extensive microscopy datasets. The results of this research show that both Convolutional Neural Network (CNN) and Vision Transformer (ViT) based masked autoencoders outperform weakly supervised baselines. At the highest scale examined in this study, a ViT-L/8 model trained on over 3.5 billion unique crops sampled from 93 million microscopy images achieved relative improvements of up to 28% over the best weakly supervised baseline in inferring known biological relationships sourced from public databases. The authors, including Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, Maciej Sypetkowski, Chi Vicky Cheng, Kristen Morse, Maureen Makes, Ben Mabey, and Berton Earnshaw conducted this study which was spotlighted at the NeurIPS 2023 Generative AI and Biology (GenBio) Workshop. Furthermore, the researchers adapted U-Nets for masked autoencoding (MU-Nets) by training them to reconstruct masked sections of input images. The results for MU-Net-M showcase promising outcomes in line with previous studies. The study also includes detailed figures illustrating StringDB recall as a function of training FLOps and recall across different cosine similarity percentiles for each database. Overall, this work contributes valuable insights into leveraging self-supervised deep learning approaches for analyzing cellular morphology in high-content microscopy screens. The release of relevant code and select models associated with this research can be accessed at https://github.com/recursionpharma/maes_microscopy.

- Deep vision models are more effective at capturing biological signals compared to hand-crafted features
- Convolutional Neural Network (CNN) and Vision Transformer (ViT) based masked autoencoders outperform weakly supervised baselines
- ViT-L/8 model trained on over 3.5 billion unique crops achieved relative improvements of up to 28% in inferring known biological relationships
- Researchers adapted U-Nets for masked autoencoding (MU-Nets) with promising outcomes
- Study includes detailed figures illustrating StringDB recall and training FLOps, as well as recall across different cosine similarity percentiles for each database

Summary1. Deep vision models are better at understanding living things than manually created features. 2. CNN and ViT based autoencoders work well without much supervision. 3. ViT-L/8 model, trained on many different plants, got 28% better at finding connections between them. 4. Scientists changed U-Nets to MU-Nets for good results. 5. The study has pictures showing how well the models remembered information and how fast they learned. Definitions- Deep vision models: Advanced computer programs that can understand images like humans do. - Convolutional Neural Network (CNN): A type of deep learning model commonly used for image recognition tasks. - Vision Transformer (ViT): Another type of deep learning model specifically designed for processing visual data. - Autoencoders: A type of neural network that learns to copy its input to its output, used for tasks like image compression or feature extraction. - Biological relationships: Connections or patterns found in living organisms or their characteristics. - U-Nets: A specific architecture of neural networks commonly used in image segmentation tasks. - StringDB recall: A measure of how well a system remembers information from a biological database called StringDB. - FLOps: Floating-point operations per second, a measure of computing performance. - Cosine similarity percentiles: A mathematical measure of how similar two things are based on the angle between them in a multi-dimensional space.

Introduction

The field of biological research has been revolutionized by the use of high-content microscopy screens, which allow for the analysis of thousands to millions of cells in a single experiment. However, with this increase in data comes the challenge of accurately inferring biological relationships from cellular phenotypes. Traditional methods using hand-crafted features have shown limitations in capturing complex biological signals. This is where deep learning approaches come into play. In recent years, deep vision models have proven to be more effective at capturing and analyzing complex patterns compared to traditional methods. In this study, researchers Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, Maciej Sypetkowski, Chi Vicky Cheng, Kristen Morse, Maureen Makes, Ben Mabey, and Berton Earnshaw delved into the scalability of self-supervised deep learning approaches when training larger models on extensive microscopy datasets.

The Study

The study was spotlighted at the NeurIPS 2023 Generative AI and Biology (GenBio) Workshop and focused on leveraging self-supervised deep learning approaches for analyzing cellular morphology in high-content microscopy screens. The researchers adapted U-Nets for masked autoencoding (MU-Nets) by training them to reconstruct masked sections of input images. This approach allows for better feature extraction and representation learning. To test their approach's effectiveness at different scales and levels of complexity, the team used two types of models: Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). These models were trained on over 3.5 billion unique crops sampled from 93 million microscopy images.

Results

The results showed that both CNN-based and ViT-based masked autoencoders outperformed weakly supervised baselines. At the highest scale examined in this study, a ViT-L/8 model achieved relative improvements of up to 28% over the best weakly supervised baseline in inferring known biological relationships sourced from public databases. The researchers also evaluated their approach's performance using different metrics such as StringDB recall as a function of training FLOps and recall across different cosine similarity percentiles for each database. These detailed figures can be found in the research paper.

Conclusion

This study provides valuable insights into leveraging self-supervised deep learning approaches for analyzing cellular morphology in high-content microscopy screens. The results demonstrate that these approaches are scalable and effective at capturing complex biological signals compared to traditional methods. Furthermore, the release of relevant code and select models associated with this research allows for further exploration and application of these techniques by other researchers in the field. This will ultimately contribute to advancements in understanding biological relationships from cellular phenotypes, leading to potential breakthroughs in various areas of biology. In conclusion, this research paper sheds light on the potential of self-supervised deep learning approaches for analyzing high-content microscopy data and opens up new avenues for future studies in this field.

Created on 19 Mar. 2024

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.