Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models

AI-generated keywords: Multimodal Large Language Models Image Inputs Vulnerabilities Jailbreak Technique HADES

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Authors Yifan Li, Hangyu Guo, Kun Zhou, Wayne Xin Zhao, and Ji-Rong Wen focus on alignment in multimodal large language models (MLLMs)
  • Uncover a vulnerability in the alignment process when image inputs are used
  • Introduce a jailbreak technique called HADES to conceal and amplify malicious intent within text inputs using images
  • Experimental findings show HADES has an impressive average Attack Success Rate (ASR) of 90.26% for LLaVA-1.5 and 71.60% for Gemini Pro Vision
  • Code and data accessible through https://github.com/RUCAIBox/HADES
  • Research contributes valuable insights into vulnerabilities associated with image inputs in MLLMs and offers a cutting-edge solution for addressing alignment challenges in AI systems
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yifan Li, Hangyu Guo, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen

ECCV 2024 Oral

Abstract: In this paper, we study the harmlessness alignment problem of multimodal large language models (MLLMs). We conduct a systematic empirical analysis of the harmlessness performance of representative MLLMs and reveal that the image input poses the alignment vulnerability of MLLMs. Inspired by this, we propose a novel jailbreak method named HADES, which hides and amplifies the harmfulness of the malicious intent within the text input, using meticulously crafted images. Experimental results show that HADES can effectively jailbreak existing MLLMs, which achieves an average Attack Success Rate (ASR) of 90.26% for LLaVA-1.5 and 71.60% for Gemini Pro Vision. Our code and data are available at https://github.com/RUCAIBox/HADES.

Submitted to arXiv on 14 Mar. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2403.09792v3

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In their paper titled "Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models," authors Yifan Li, Hangyu Guo, Kun Zhou, Wayne Xin Zhao, and Ji-Rong Wen delve into the issue of alignment in multimodal large language models (MLLMs). Through a systematic empirical analysis of representative MLLMs, they uncover a vulnerability in the alignment process when image inputs are utilized. Building upon this discovery, the researchers introduce a groundbreaking jailbreak technique called HADES. This method strategically conceals and amplifies malicious intent within text inputs by incorporating intricately designed images. The experimental findings demonstrate the effectiveness of HADES in circumventing existing MLLMs with an impressive average Attack Success Rate (ASR) of 90.26% for LLaVA-1.5 and 71.60% for Gemini Pro Vision when employing this novel approach. The authors provide further insights into their work by making both code and data accessible through https://github.com/RUCAIBox/HADES. This study not only sheds light on vulnerabilities associated with image inputs in MLLMs but also introduces a cutting-edge solution that significantly enhances understanding and mitigation of alignment issues in these complex language models. With their innovative methodology and compelling results, Li et al. 's research makes a valuable contribution to the field of multimodal large language models and sets a new standard for addressing alignment challenges in AI systems.
Created on 22 Jan. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.