Anomaly Detection by Adapting a pre-trained Vision Language Model
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Recent advancements in large vision and language models have shown efficacy in anomaly detection across various tasks.
- The CLIP-ADA framework is introduced for Anomaly Detection by Adapting a pre-trained CLIP model, incorporating two key enhancements:
- Introduction of a learnable prompt linked with abnormal patterns through self-supervised learning for consistent anomaly detection.
- Proposal of an anomaly region refinement strategy to improve localization accuracy and fully utilize CLIP's representation capabilities.
- During testing, anomalies are pinpointed by assessing similarity between the representation of the learnable prompt and the image.
- Extensive experiments demonstrate superior performance of CLIP-ADA, achieving state-of-the-art results on MVTec-AD and VisA datasets for both anomaly detection and localization tasks.
- The method shows promising performance even with limited training data, showcasing robustness in challenging scenarios.
Authors: Yuxuan Cai, Xinwei He, Dingkang Liang, Ao Tong, Xiang Bai
Abstract: Recently, large vision and language models have shown their success when adapting them to many downstream tasks. In this paper, we present a unified framework named CLIP-ADA for Anomaly Detection by Adapting a pre-trained CLIP model. To this end, we make two important improvements: 1) To acquire unified anomaly detection across industrial images of multiple categories, we introduce the learnable prompt and propose to associate it with abnormal patterns through self-supervised learning. 2) To fully exploit the representation power of CLIP, we introduce an anomaly region refinement strategy to refine the localization quality. During testing, the anomalies are localized by directly calculating the similarity between the representation of the learnable prompt and the image. Comprehensive experiments demonstrate the superiority of our framework, e.g., we achieve the state-of-the-art 97.5/55.6 and 89.3/33.1 on MVTec-AD and VisA for anomaly detection and localization. In addition, the proposed method also achieves encouraging performance with marginal training data, which is more challenging.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.