MoDA: Leveraging Motion Priors from Videos for Advancing Unsupervised Domain Adaptation in Semantic Segmentation
AI-generated Key Points
- MoDA (Motion-guided Domain Adaptive) framework introduced for unsupervised domain adaptation in semantic segmentation tasks
- Utilizes self-supervised learning techniques to extract object motion cues from unlabeled video frames with geometric constraints
- Aims to facilitate cross-domain alignment by leveraging motion priors for semantic segmentation
- Key components include an object discovery module for localizing and segmenting moving objects, and a semantic mining module for refining pseudo labels based on object masks
- Refined pseudo labels used in self-training loop to bridge domain gap, enhancing annotation quality in target domain
- Experimental results show MoDA outperforms traditional optical flow-based methods in domain alignment effectiveness
- Complements existing state-of-the-art UDA approaches, demonstrating versatility and potential as a valuable addition to the field
- Presented at CVPR 2024 Workshop on Learning with Limited Labelled Data for Image and Video Understanding, received Best Paper Award
- Code implementation available at https://github.com/feipanir/MoDA for further exploration and application
Authors: Fei Pan, Xu Yin, Seokju Lee, Axi Niu, Sungeui Yoon, In So Kweon
Abstract: Unsupervised domain adaptation (UDA) has been a potent technique to handle the lack of annotations in the target domain, particularly in semantic segmentation task. This study introduces a different UDA scenarios where the target domain contains unlabeled video frames. Drawing upon recent advancements of self-supervised learning of the object motion from unlabeled videos with geometric constraint, we design a \textbf{Mo}tion-guided \textbf{D}omain \textbf{A}daptive semantic segmentation framework (MoDA). MoDA harnesses the self-supervised object motion cues to facilitate cross-domain alignment for segmentation task. First, we present an object discovery module to localize and segment target moving objects using object motion information. Then, we propose a semantic mining module that takes the object masks to refine the pseudo labels in the target domain. Subsequently, these high-quality pseudo labels are used in the self-training loop to bridge the cross-domain gap. On domain adaptive video and image segmentation experiments, MoDA shows the effectiveness utilizing object motion as guidance for domain alignment compared with optical flow information. Moreover, MoDA exhibits versatility as it can complement existing state-of-the-art UDA approaches. Code at https://github.com/feipanir/MoDA.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.