FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts

AI-generated keywords: FigStep

AI-generated Key Points

  • FigStep: A novel black-box jailbreak algorithm for LVLM vulnerabilities
  • LVLMs: Advancements in AI through multimodal incorporation of images and text
  • Safety Alignment: Addressing risks of overreliance on LLM safety assurances
  • SafeBench: Comprehensive benchmark for LVLM safety assessment and improvement
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yichen Gong, Delong Ran, Jinyuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, Xiaoyun Wang

AAAI 2025 (Oral)
License: CC BY 4.0

Abstract: Large Vision-Language Models (LVLMs) signify a groundbreaking paradigm shift within the Artificial Intelligence (AI) community, extending beyond the capabilities of Large Language Models (LLMs) by assimilating additional modalities (e.g., images). Despite this advancement, the safety of LVLMs remains adequately underexplored, with a potential overreliance on the safety assurances purported by their underlying LLMs. In this paper, we propose FigStep, a straightforward yet effective black-box jailbreak algorithm against LVLMs. Instead of feeding textual harmful instructions directly, FigStep converts the prohibited content into images through typography to bypass the safety alignment. The experimental results indicate that FigStep can achieve an average attack success rate of 82.50% on six promising open-source LVLMs. Not merely to demonstrate the efficacy of FigStep, we conduct comprehensive ablation studies and analyze the distribution of the semantic embeddings to uncover that the reason behind the success of FigStep is the deficiency of safety alignment for visual embeddings. Moreover, we compare FigStep with five text-only jailbreaks and four image-based jailbreaks to demonstrate the superiority of FigStep, i.e., negligible attack costs and better attack performance. Above all, our work reveals that current LVLMs are vulnerable to jailbreak attacks, which highlights the necessity of novel cross-modality safety alignment techniques. Our code and datasets are available at https://github.com/ThuCCSLab/FigStep .

Submitted to arXiv on 09 Nov. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2311.05608v3

, , , , In this paper, the authors introduce FigStep, a novel black-box jailbreak algorithm designed to exploit vulnerabilities in Large Vision-Language Models (LVLMs). These models represent a significant advancement in AI by incorporating multiple modalities such as images along with text. However, the safety of LVLMs has not been adequately explored, leading to potential risks associated with overreliance on safety assurances from underlying Large Language Models (LLMs). <kw>FigStep:</kw> A Novel Black-Box Jailbreak Algorithm for LVLM Vulnerabilities <kw>LVLMs:</kw> Advancements in AI through Multimodal Incorporation of Images and Text <kw>Safety Alignment:</kw> Addressing Risks of Overreliance on LLM Safety Assurances <kw>SafeBench:</kw> Comprehensive Benchmark for LVLM Safety Assessment and Improvement FigStep operates by converting prohibited textual content into images using typography, effectively bypassing existing safety measures. The experimental results demonstrate that FigStep achieves an impressive average attack success rate of 82.50% across six prominent open-source LVLMs. Through comprehensive ablation studies and analysis of semantic embeddings distribution, the authors reveal that FigStep's success stems from the lack of safety alignment for visual embeddings in current LVLMs. Furthermore, the study compares FigStep against five text-only and four image-based jailbreak techniques, highlighting its superior performance in terms of attack costs and success rates. The findings underscore the vulnerability of LVLMs to jailbreak attacks and emphasize the need for innovative cross-modality safety alignment methods to enhance model security. To facilitate further research and experimentation, the authors introduce SafeBench, a comprehensive safety benchmark comprising 500 harmful questions generated through LLM-based dataset generation techniques. The study focuses on analyzing two promising open-source LVLMs - LLaVA-v1.5 and MiniGPT4 - showcasing how FigStep can effectively exploit weaknesses in these models. Overall, this work sheds light on the critical importance of addressing security concerns in LVLMs and provides valuable insights into developing robust defenses against potential adversarial attacks. The code and datasets associated with FigStep are available for further exploration on GitHub at https://github.com/ThuCCSLab/FigStep.
Created on 07 Sep. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.