A Closer Look at Claim Decomposition

AI-generated keywords: Claim Decomposition External Knowledge Sources Evaluation Metrics LLM-based Approaches DecompScore

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Importance of evaluating support of generated text by external knowledge sources
  • Impact of claim decomposition methods on evaluation metrics like FActScore
  • Sensitivity in results due to choice of decomposition method
  • Proposal of new metric called DecompScore to measure decomposition quality accurately
  • Introduction of LLM-based approach inspired by Bertrand Russell's theory for enhancing decomposition quality
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Miriam Wanner, Seth Ebner, Zhengping Jiang, Mark Dredze, Benjamin Van Durme

Abstract: As generated text becomes more commonplace, it is increasingly important to evaluate how well-supported such text is by external knowledge sources. Many approaches for evaluating textual support rely on some method for decomposing text into its individual subclaims which are scored against a trusted reference. We investigate how various methods of claim decomposition -- especially LLM-based methods -- affect the result of an evaluation approach such as the recently proposed FActScore, finding that it is sensitive to the decomposition method used. This sensitivity arises because such metrics attribute overall textual support to the model that generated the text even though error can also come from the metric's decomposition step. To measure decomposition quality, we introduce an adaptation of FActScore, which we call DecompScore. We then propose an LLM-based approach to generating decompositions inspired by Bertrand Russell's theory of logical atomism and neo-Davidsonian semantics and demonstrate its improved decomposition quality over previous methods.

Submitted to arXiv on 18 Mar. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2403.11903v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The paper titled "A Closer Look at Claim Decomposition" by Miriam Wanner, Seth Ebner, Zhengping Jiang, Mark Dredze, and Benjamin Van Durme delves into the importance of evaluating the support of generated text by external knowledge sources. With the increasing prevalence of generated text in natural language processing tasks, it has become crucial to assess the reliability of such content. The authors explore various methods of claim decomposition and their impact on evaluation metrics like FActScore. They discover that the choice of decomposition method significantly influences the results obtained from evaluation approaches. This sensitivity is due to attributing overall textual support solely to the model generating the text without considering potential errors introduced during decomposition. To address this issue and accurately measure decomposition quality, they propose a new metric called DecompScore as an adaptation of FActScore. By introducing an LLM-based approach inspired by Bertrand Russell's theory of logical atomism and neo-Davidsonian semantics, they aim to enhance decomposition quality compared to existing methods. In summary, this research sheds light on the intricate relationship between claim decomposition methods and evaluation metrics in assessing textual support accuracy. The introduction of DecompScore and the innovative LLM-based approach offer promising advancements in improving decomposition quality for more reliable evaluations.
Created on 05 Apr. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.