Self Pre-training with Masked Autoencoders for Medical Image Classification and Segmentation

AI-generated keywords: Masked Autoencoder Vision Transformers Medical Image Analysis Self Pre-training Context Aggregation

AI-generated Key Points

  • Masked Autoencoder (MAE) is an effective pre-training method for Vision Transformers (ViT) in natural image analysis
  • MAE enables ViT encoder to aggregate contextual information and infer masked image regions, crucial in medical image domain
  • Self-pretraining approach using MAE for medical image analysis tasks due to lack of ImageNet-scale medical image dataset
  • MAE self-pretraining significantly enhances medical image tasks such as chest X-ray disease classification, abdominal CT multi-organ segmentation, and MRI brain tumor segmentation
  • ViT with MAE self-pretraining outperforms state-of-the-art CNN-based models utilizing ImageNet pre-training and other self-supervised pre-training methods like MoCo and LSAE
  • MAE self-pretraining shows substantial improvements in abdomen multi-organ segmentation compared to UNETR baseline model
  • Superior performance indicated by average Dice Similarity Coefficient (DSC) scores with increasing training data sizes, highlighting effectiveness of the proposed approach in enhancing segmentation accuracy
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Lei Zhou, Huidong Liu, Joseph Bae, Junjun He, Dimitris Samaras, Prateek Prasanna

ISBI2023 camera-ready version (no substantial difference from v1); Code is available at https://github.com/cvlab-stonybrook/SelfMedMAE
License: CC BY 4.0

Abstract: Masked Autoencoder (MAE) has recently been shown to be effective in pre-training Vision Transformers (ViT) for natural image analysis. By reconstructing full images from partially masked inputs, a ViT encoder aggregates contextual information to infer masked image regions. We believe that this context aggregation ability is particularly essential to the medical image domain where each anatomical structure is functionally and mechanically connected to other structures and regions. Because there is no ImageNet-scale medical image dataset for pre-training, we investigate a self pre-training paradigm with MAE for medical image analysis tasks. Our method pre-trains a ViT on the training set of the target data instead of another dataset. Thus, self pre-training can benefit more scenarios where pre-training data is hard to acquire. Our experimental results show that MAE self pre-training markedly improves diverse medical image tasks including chest X-ray disease classification, abdominal CT multi-organ segmentation, and MRI brain tumor segmentation. Code is available at https://github.com/cvlab-stonybrook/SelfMedMAE

Submitted to arXiv on 10 Mar. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2203.05573v2

Masked Autoencoder (MAE) has proven to be an effective pre-training method for Vision Transformers (ViT) in natural image analysis. By reconstructing full images from partially masked inputs, MAE enables the ViT encoder to aggregate contextual information and infer masked image regions. This is particularly crucial in the medical image domain where anatomical structures are interconnected. Due to the lack of an ImageNet-scale medical image dataset for pre-training, researchers have turned to a self-pretraining approach using MAE for medical image analysis tasks. This method pre-trains a ViT on the training set of the target data, making it beneficial for scenarios where acquiring pre-training data is challenging. Experimental results demonstrate that MAE self-pretraining significantly enhances various medical image tasks such as chest X-ray disease classification, abdominal CT multi-organ segmentation, and MRI brain tumor segmentation. The study conducted by Lei Zhou et al. shows promising outcomes in improving performance across these tasks. Comparisons with state-of-the-art CNN-based models utilizing ImageNet pre-training and self-supervised pre-training methods like MoCo and LSAE reveal that ViT with MAE self-pretraining outperforms them all. Specifically focusing on abdomen multi-organ segmentation results presented in Table 1 show substantial improvements achieved through MAE self-pretraining compared to the UNETR baseline model. The average Dice Similarity Coefficient (DSC) scores indicate superior performance with increasing training data sizes and highlight the effectiveness of the proposed approach in enhancing segmentation accuracy. In conclusion, this research demonstrates that MAE self-pretraining with Vision Transformers holds great potential for advancing medical image analysis tasks by leveraging context aggregation abilities essential for understanding complex anatomical structures. The findings underscore the importance of tailored pre-training strategies in domains where large-scale datasets are limited and showcase significant performance gains across diverse medical imaging applications.
Created on 04 Mar. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.