Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective

AI-generated keywords: LLM unlearning targeted unlearning causal intervention framework evaluation metrics code repository

AI-generated Key Points

  • Authors investigate targeted unlearning within LLMs
  • Study conducted in two main steps
  • Introduce novel task of targeted unlearning
  • Goal to remove information about a specific target from documents
  • Criteria for successful unlearning established
  • Proposed framework for achieving targeted unlearning
  • Simple algorithm derived from the framework
  • Comprehensive evaluations designed to assess efficacy of targeted unlearning
  • Experiments on existing and new datasets demonstrate effectiveness without explicit optimization for predefined criteria
  • Research contributes to advancing understanding and application of targeted unlearning within LLMs from a causal intervention perspective
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yujian Liu, Yang Zhang, Tommi Jaakkola, Shiyu Chang

License: CC BY 4.0

Abstract: This paper investigates Who's Harry Potter (WHP), a pioneering yet insufficiently understood method for LLM unlearning. We explore it in two steps. First, we introduce a new task of LLM targeted unlearning, where given an unlearning target (e.g., a person) and some unlearning documents, we aim to unlearn only the information about the target, rather than everything in the unlearning documents. We further argue that a successful unlearning should satisfy criteria such as not outputting gibberish, not fabricating facts about the unlearning target, and not releasing factual information under jailbreak attacks. Second, we construct a causal intervention framework for targeted unlearning, where the knowledge of the unlearning target is modeled as a confounder between LLM input and output, and the unlearning process as a deconfounding process. This framework justifies and extends WHP, deriving a simple unlearning algorithm that includes WHP as a special case. Experiments on existing and new datasets show that our approach, without explicitly optimizing for the aforementioned criteria, achieves competitive performance in all of them. Our code is available at https://github.com/UCSB-NLP-Chang/causal_unlearn.git.

Submitted to arXiv on 24 Jul. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2407.16997v1

In this paper, the authors investigate , specifically focusing on . The study is conducted in two main steps. Firstly, they introduce a novel task of targeted unlearning, where the goal is to remove information about a specific target (e.g., a person) from a set of documents. They establish criteria for successful unlearning and propose a for achieving it. This framework not only justifies but also extends their method by deriving a simple algorithm that encompasses it as a special case. Furthermore, comprehensive are designed to assess the efficacy of targeted unlearning. Experiments on both existing and new datasets demonstrate its effectiveness without explicitly optimizing for predefined criteria. The authors provide their for further exploration. Overall, this research contributes to advancing understanding and application of targeted unlearning within LLMs from a causal intervention perspective.
Created on 06 Jan. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.