MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

AI-generated keywords: Remote Sensing

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Study titled "MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining" by Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, Haonan Guo, Bo Du, Dacheng Tao, and Liangpei Zhang
  • Focus on enhancing image interpretation tasks in Remote Sensing (RS) using foundation models
  • Introduction of Multi-Task Pretraining (MTP) paradigm to address task discrepancy during model transfer to downstream tasks
  • Utilization of shared encoder and task-specific decoder architecture for multi-task supervised pretraining on the SAMRS dataset
  • Support for convolutional neural networks and vision transformer foundation models with over 300 million parameters
  • Fine-tuning of pretrained models on various RS downstream tasks leading to outperformance of existing models and competitive performance compared to larger state-of-the-art models
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, Haonan Guo, Bo Du, Dacheng Tao, Liangpei Zhang

The codes and pretrained models will be released at https://github.com/ViTAE-Transformer/MTP

Abstract: Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. Pretraining is an active research topic, encompassing supervised and self-supervised learning methods to initialize model weights effectively. However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks. In this study, we explore the Multi-Task Pretraining (MTP) paradigm for RS foundation models to address this issue. Using a shared encoder and task-specific decoder architecture, we conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. MTP supports both convolutional neural networks and vision transformer foundation models with over 300 million parameters. The pretrained models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection. Extensive experiments across 14 datasets demonstrate the superiority of our models over existing ones of similar size and their competitive performance compared to larger state-of-the-art models, thus validating the effectiveness of MTP.

Submitted to arXiv on 20 Mar. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2403.13430v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

, , , , In their study titled "MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining," authors Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, Haonan Guo, Bo Du, Dacheng Tao, and Liangpei Zhang delve into the realm of Remote Sensing (RS) and the significant impact foundation models have had on enhancing image interpretation tasks. The focus of their research lies in pretraining methods for these models using supervised and self-supervised learning techniques to effectively initialize model weights. To address this issue of task discrepancy during model transfer to downstream tasks such as image classification or object discrimination tasks, the authors propose the Multi-Task Pretraining (MTP) paradigm for RS foundation models. This approach utilizes a shared encoder and task-specific decoder architecture to conduct multi-task supervised pretraining on the SAMRS dataset. Tasks included in this phase encompass semantic segmentation, instance segmentation, and rotated object detection. Notably, MTP supports both convolutional neural networks and vision transformer foundation models with over 300 million parameters. Following the multi-task supervised pretraining phase on SAMRS dataset, the pretrained models are fine-tuned on various RS downstream tasks including scene classification, horizontal and rotated object detection, semantic segmentation, and change detection. Through extensive experiments conducted across 14 datasets,<fd>the authors demonstrate that their models outperform existing ones of similar size.</fd> Furthermore,<fd>they also showcase competitive performance compared to larger state-of-the-art models,</fd> validating the effectiveness of MTP in optimizing model performance for complex image interpretation tasks within the field of Remote Sensing. Overall, this research highlights the significance of Multi-Task Pretraining in advancing Remote Sensing Foundation Models by addressing task discrepancy issues encountered during model transfer to downstream tasks. The findings underscore the importance of innovative pretraining paradigms in optimizing model performance for complex image interpretation tasks within the field of Remote Sensing.
Created on 29 May. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.