, , , ,
In the field of biological research, inferring biological relationships from cellular phenotypes in high-content microscopy screens presents both significant opportunities and challenges. Previous studies have demonstrated that deep vision models are more effective at capturing biological signals compared to hand-crafted features. This study delves into the scalability of self-supervised deep learning approaches when training larger models on extensive microscopy datasets. The results of this research show that both Convolutional Neural Network (CNN) and Vision Transformer (ViT) based masked autoencoders outperform weakly supervised baselines. At the highest scale examined in this study, a ViT-L/8 model trained on over 3.5 billion unique crops sampled from 93 million microscopy images achieved relative improvements of up to 28% over the best weakly supervised baseline in inferring known biological relationships sourced from public databases. The authors, including Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, Maciej Sypetkowski, Chi Vicky Cheng, Kristen Morse, Maureen Makes, Ben Mabey, and Berton Earnshaw conducted this study which was spotlighted at the NeurIPS 2023 Generative AI and Biology (GenBio) Workshop. Furthermore,
the researchers adapted U-Nets for masked autoencoding (MU-Nets) by training them to reconstruct masked sections of input images. The results for MU-Net-M showcase promising outcomes in line with previous studies. The study also includes detailed figures illustrating StringDB recall as a function of training FLOps and recall across different cosine similarity percentiles for each database. Overall, this work contributes valuable insights into leveraging self-supervised deep learning approaches for analyzing cellular morphology in high-content microscopy screens. The release of relevant code and select models associated with this research can be accessed at https://github.com/recursionpharma/maes_microscopy.
- - Deep vision models are more effective at capturing biological signals compared to hand-crafted features
- - Convolutional Neural Network (CNN) and Vision Transformer (ViT) based masked autoencoders outperform weakly supervised baselines
- - ViT-L/8 model trained on over 3.5 billion unique crops achieved relative improvements of up to 28% in inferring known biological relationships
- - Researchers adapted U-Nets for masked autoencoding (MU-Nets) with promising outcomes
- - Study includes detailed figures illustrating StringDB recall and training FLOps, as well as recall across different cosine similarity percentiles for each database
Summary1. Deep vision models are better at understanding living things than manually created features.
2. CNN and ViT based autoencoders work well without much supervision.
3. ViT-L/8 model, trained on many different plants, got 28% better at finding connections between them.
4. Scientists changed U-Nets to MU-Nets for good results.
5. The study has pictures showing how well the models remembered information and how fast they learned.
Definitions- Deep vision models: Advanced computer programs that can understand images like humans do.
- Convolutional Neural Network (CNN): A type of deep learning model commonly used for image recognition tasks.
- Vision Transformer (ViT): Another type of deep learning model specifically designed for processing visual data.
- Autoencoders: A type of neural network that learns to copy its input to its output, used for tasks like image compression or feature extraction.
- Biological relationships: Connections or patterns found in living organisms or their characteristics.
- U-Nets: A specific architecture of neural networks commonly used in image segmentation tasks.
- StringDB recall: A measure of how well a system remembers information from a biological database called StringDB.
- FLOps: Floating-point operations per second, a measure of computing performance.
- Cosine similarity percentiles: A mathematical measure of how similar two things are based on the angle between them in a multi-dimensional space.
Introduction
The field of biological research has been revolutionized by the use of high-content microscopy screens, which allow for the analysis of thousands to millions of cells in a single experiment. However, with this increase in data comes the challenge of accurately inferring biological relationships from cellular phenotypes. Traditional methods using hand-crafted features have shown limitations in capturing complex biological signals. This is where deep learning approaches come into play.
In recent years, deep vision models have proven to be more effective at capturing and analyzing complex patterns compared to traditional methods. In this study, researchers Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik,
Maciej Sypetkowski,
Chi Vicky Cheng,
Kristen Morse,
Maureen Makes,
Ben Mabey,
and Berton Earnshaw delved into the scalability of self-supervised deep learning approaches when training larger models on extensive microscopy datasets.
The Study
The study was spotlighted at the NeurIPS 2023 Generative AI and Biology (GenBio) Workshop and focused on leveraging self-supervised deep learning approaches for analyzing cellular morphology in high-content microscopy screens. The researchers adapted U-Nets for masked autoencoding (MU-Nets) by training them to reconstruct masked sections of input images. This approach allows for better feature extraction and representation learning.
To test their approach's effectiveness at different scales and levels of complexity, the team used two types of models: Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). These models were trained on over 3.5 billion unique crops sampled from 93 million microscopy images.
Results
The results showed that both CNN-based and ViT-based masked autoencoders outperformed weakly supervised baselines. At the highest scale examined in this study, a ViT-L/8 model achieved relative improvements of up to 28% over the best weakly supervised baseline in inferring known biological relationships sourced from public databases.
The researchers also evaluated their approach's performance using different metrics such as StringDB recall as a function of training FLOps and recall across different cosine similarity percentiles for each database. These detailed figures can be found in the research paper.
Conclusion
This study provides valuable insights into leveraging self-supervised deep learning approaches for analyzing cellular morphology in high-content microscopy screens. The results demonstrate that these approaches are scalable and effective at capturing complex biological signals compared to traditional methods.
Furthermore, the release of relevant code and select models associated with this research allows for further exploration and application of these techniques by other researchers in the field. This will ultimately contribute to advancements in understanding biological relationships from cellular phenotypes, leading to potential breakthroughs in various areas of biology.
In conclusion, this research paper sheds light on the potential of self-supervised deep learning approaches for analyzing high-content microscopy data and opens up new avenues for future studies in this field.