AMSI-Based Detection of Malicious PowerShell Code Using Contextual Embeddings
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- PowerShell is a widely used command-line shell and scripting language for configuration management and task automation in organizations.
- Malicious PowerShell code poses a threat as cybercriminals target it for attacks.
- Microsoft's Antimalware Scan Interface (AMSI) allows systems to scan PowerShell code before execution.
- The authors of the study investigated using AMSI information to detect malicious PowerShell code.
- They developed deep-learning based detectors that utilize pretrained contextual embeddings of words from the language.
- Learning a pretrained contextual embedding based on unlabeled data helps mitigate the scarcity of labeled data in cybersecurity.
- Real-world data collected through AMSI from a large antimalware vendor was used to train and evaluate the models.
- Incorporating unlabeled data for embedding showed significant improvement in detection performance.
- The best-performing model processed textual signals at both character and token levels, achieving a true positive rate of nearly 90% with a low false-positive rate of less than 0.1%.
- Leveraging AMSI and deep learning techniques is important for effective detection of malicious PowerShell code, especially when labeled datasets are limited in cybersecurity domains like PowerShell code analysis.
Authors: Amir Rubin, Shay Kels, Danny Hendler
Abstract: PowerShell is a command-line shell, supporting a scripting language. It is widely used in organizations for configuration management and task automation but is also increasingly used by cybercriminals for launching cyberattacks against organizations, mainly because it is pre-installed on Windows machines and exposes strong functionality that may be leveraged by attackers. This makes the problem of detecting malicious PowerShell code both urgent and challenging. Microsoft's Antimalware Scan Interface (AMSI) allows defending systems to scan all the code passed to scripting engines such as PowerShell prior to its execution. In this work, we conduct the first study of malicious PowerShell code detection using the information made available by AMSI. We present several novel deep-learning based detectors of malicious PowerShell code that employ pretrained contextual embeddings of words from the PowerShell "language". A known problem in the cybersecurity domain is that labeled data is relatively scarce in comparison with unlabeled data, making it difficult to devise effective supervised detection of malicious activity of many types. This is also the case with PowerShell code. Our work shows that this problem can be mitigated by learning a pretrained contextual embedding based on unlabeled data. We trained and evaluated our models using real-world data, collected using AMSI from a large antimalware vendor. Our performance analysis establishes that the use of unlabeled data for the embedding significantly improved the performance of our detectors. Our best-performing model uses an architecture that enables the processing of textual signals from both the character and token levels and obtains a true positive rate of nearly 90% while maintaining a low false-positive rate of less than 0.1%.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.