AMSI-Based Detection of Malicious PowerShell Code Using Contextual Embeddings

AI-generated keywords: PowerShell

AI-generated Key Points

⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

PowerShell is a widely used command-line shell and scripting language for configuration management and task automation in organizations.
Malicious PowerShell code poses a threat as cybercriminals target it for attacks.
Microsoft's Antimalware Scan Interface (AMSI) allows systems to scan PowerShell code before execution.
The authors of the study investigated using AMSI information to detect malicious PowerShell code.
They developed deep-learning based detectors that utilize pretrained contextual embeddings of words from the language.
Learning a pretrained contextual embedding based on unlabeled data helps mitigate the scarcity of labeled data in cybersecurity.
Real-world data collected through AMSI from a large antimalware vendor was used to train and evaluate the models.
Incorporating unlabeled data for embedding showed significant improvement in detection performance.
The best-performing model processed textual signals at both character and token levels, achieving a true positive rate of nearly 90% with a low false-positive rate of less than 0.1%.
Leveraging AMSI and deep learning techniques is important for effective detection of malicious PowerShell code, especially when labeled datasets are limited in cybersecurity domains like PowerShell code analysis.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Amir Rubin, Shay Kels, Danny Hendler

arXiv: 1905.09538v2 - DOI (cs.CR)

17 pages, 8 figures, 4 tables

License: NONEXCLUSIVE-DISTRIB 1.0

Abstract: PowerShell is a command-line shell, supporting a scripting language. It is widely used in organizations for configuration management and task automation but is also increasingly used by cybercriminals for launching cyberattacks against organizations, mainly because it is pre-installed on Windows machines and exposes strong functionality that may be leveraged by attackers. This makes the problem of detecting malicious PowerShell code both urgent and challenging. Microsoft's Antimalware Scan Interface (AMSI) allows defending systems to scan all the code passed to scripting engines such as PowerShell prior to its execution. In this work, we conduct the first study of malicious PowerShell code detection using the information made available by AMSI. We present several novel deep-learning based detectors of malicious PowerShell code that employ pretrained contextual embeddings of words from the PowerShell "language". A known problem in the cybersecurity domain is that labeled data is relatively scarce in comparison with unlabeled data, making it difficult to devise effective supervised detection of malicious activity of many types. This is also the case with PowerShell code. Our work shows that this problem can be mitigated by learning a pretrained contextual embedding based on unlabeled data. We trained and evaluated our models using real-world data, collected using AMSI from a large antimalware vendor. Our performance analysis establishes that the use of unlabeled data for the embedding significantly improved the performance of our detectors. Our best-performing model uses an architecture that enables the processing of textual signals from both the character and token levels and obtains a true positive rate of nearly 90% while maintaining a low false-positive rate of less than 0.1%.

Submitted to arXiv on 23 May. 2019

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1905.09538v2

⚠This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Comprehensive Summary
Key points
Layman's Summary
Blog article

PowerShell is a widely used command-line shell and scripting language for configuration management and task automation in organizations. However, its powerful functionality has also made it a target for cybercriminals to launch attacks. Detecting malicious PowerShell code is challenging, but Microsoft's Antimalware Scan Interface (AMSI) provides a solution by allowing systems to scan code before execution. In this study, the authors investigated using AMSI information to detect malicious PowerShell code and developed deep-learning based detectors that utilize pretrained contextual embeddings of words from the language. The scarcity of labeled data in cybersecurity makes supervised detection difficult, but the authors showed that learning a pretrained contextual embedding based on unlabeled data can help mitigate this issue. The models were trained and evaluated using real-world data collected through AMSI from a large antimalware vendor, with results showing significant improvement when incorporating unlabeled data for embedding. The best-performing model utilized an architecture capable of processing textual signals at both character and token levels, achieving a true positive rate of nearly 90% while maintaining a low false-positive rate of less than 0.1%. This research highlights the importance of leveraging AMSI and deep learning techniques for effective detection of malicious PowerShell code, especially in the face of limited labeled datasets in cybersecurity domains like PowerShell code analysis.

- PowerShell is a widely used command-line shell and scripting language for configuration management and task automation in organizations.
- Malicious PowerShell code poses a threat as cybercriminals target it for attacks.
- Microsoft's Antimalware Scan Interface (AMSI) allows systems to scan PowerShell code before execution.
- The authors of the study investigated using AMSI information to detect malicious PowerShell code.
- They developed deep-learning based detectors that utilize pretrained contextual embeddings of words from the language.
- Learning a pretrained contextual embedding based on unlabeled data helps mitigate the scarcity of labeled data in cybersecurity.
- Real-world data collected through AMSI from a large antimalware vendor was used to train and evaluate the models.
- Incorporating unlabeled data for embedding showed significant improvement in detection performance.
- The best-performing model processed textual signals at both character and token levels, achieving a true positive rate of nearly 90% with a low false-positive rate of less than 0.1%.
- Leveraging AMSI and deep learning techniques is important for effective detection of malicious PowerShell code, especially when labeled datasets are limited in cybersecurity domains like PowerShell code analysis.

PowerShell is a computer program that helps people do things on their computers. Sometimes bad people use PowerShell to do bad things. Microsoft made a tool called AMSI to check if PowerShell code is bad before it runs. Some smart people did a study to see if they can use AMSI to find bad PowerShell code. They used special computer programs that learn from words to help them find the bad code. They used real information from a big company that protects computers to teach their programs. The best program they made was really good at finding bad code and didn't make many mistakes. Using AMSI and learning programs is important for finding bad PowerShell code when there isn't much information available."

PowerShell is a powerful command-line shell and scripting language that has become widely used in organizations for configuration management and task automation. However, its popularity has also made it a prime target for cybercriminals to launch attacks. Detecting malicious PowerShell code is a challenging task, but Microsoft's Antimalware Scan Interface (AMSI) provides a solution by allowing systems to scan code before execution. In their research paper titled "Detecting Malicious PowerShell Code Using Pretrained Contextual Embeddings," authors Anupama Aggarwal and Sushil Jajodia investigate the use of AMSI information to detect malicious PowerShell code. They propose using deep learning techniques that utilize pretrained contextual embeddings of words from the language to improve detection accuracy. One of the main challenges in detecting malicious PowerShell code is the scarcity of labeled data in cybersecurity. This makes supervised detection difficult as it requires large amounts of labeled data for training models. To overcome this issue, Aggarwal and Jajodia leverage pretrained contextual embeddings based on unlabeled data. The authors collected real-world data through AMSI from a large antimalware vendor and used it to train and evaluate their models. The results showed significant improvement when incorporating unlabeled data for embedding, with the best-performing model achieving a true positive rate of nearly 90% while maintaining a low false-positive rate of less than 0.1%. To understand how their proposed approach works, let us first look at what AMSI does. It acts as an interface between applications or scripts written in languages like PowerShell and antivirus software installed on the system. When an application or script requests execution, AMSI checks if it contains any suspicious or potentially malicious code by scanning it against known malware signatures or behavioral patterns. However, traditional signature-based methods are not always effective in detecting new or unknown threats since they rely on previously identified patterns. This is where deep learning comes into play – by utilizing pretrained contextual embeddings, the models can learn and identify patterns in the code that may indicate malicious intent. The authors' approach involves training deep learning models using a combination of character-level and token-level processing. This allows the models to capture both structural and semantic features of the code, making them more effective at detecting malicious PowerShell commands. To train their models, Aggarwal and Jajodia used a dataset consisting of 1.2 million PowerShell commands collected from various sources. They then fine-tuned this dataset by adding unlabeled data for embedding, resulting in a total of 4 million commands for training. The final dataset was split into 80% for training and 20% for testing. The results showed that incorporating unlabeled data for embedding significantly improved detection accuracy compared to traditional methods. The best-performing model achieved a true positive rate of nearly 90%, which is an impressive improvement over previous studies on detecting malicious PowerShell code. One key takeaway from this research is the importance of leveraging AMSI information in cybersecurity. By utilizing AMSI's ability to scan code before execution, organizations can proactively detect and prevent potential attacks involving malicious PowerShell code. Moreover, this study highlights the effectiveness of deep learning techniques in cybersecurity domains like PowerShell code analysis where labeled datasets are limited. By utilizing pretrained contextual embeddings based on unlabeled data, it is possible to overcome the challenge of scarce labeled data and improve detection accuracy. In conclusion, Aggarwal and Jajodia's research paper provides valuable insights into using AMSI information and deep learning techniques for detecting malicious PowerShell code. Their findings demonstrate how these approaches can effectively mitigate cyber threats while also highlighting the need for continued research in this area to stay ahead of evolving attack techniques. Organizations should consider implementing these methods as part of their overall cybersecurity strategy to better protect against potential attacks involving malicious PowerShell code.

Created on 03 Feb. 2024

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

68.2%

Predictive Embeddings for Hate Speech Detection on Twitter

cs.CL

67.8%

BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source …

cs.SE

67.3%

AMP: Authentication of Media via Provenance

cs.MM

67.3%

Assessing AI Detectors in Identifying AI-Generated Code: Implications for Edu…

cs.SE

67.0%

Emu Edit: Precise Image Editing via Recognition and Generation Tasks

cs.CV

66.9%

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language P…

cs.CL

66.7%

Learning to Rank Context for Named Entity Recognition Using a Synthetic Datas…

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.