Legal Case Document Summarization: Extractive and Abstractive Methods and their Evaluation

AI-generated keywords: Legal NLP

AI-generated Key Points

  • Legal NLP faces challenges in summarizing legal case judgement documents
  • Different types of summarization models, such as extractive and abstractive, need analysis in the legal field
  • Transformer-based abstractive summarization models have limitations with lengthy legal documents
  • Various extractive and abstractive summarization methods were tested across three legal datasets
  • General domain-agnostic methods often outperformed domain-specific approaches in legal document summarization tasks
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Abhay Shukla, Paheli Bhattacharya, Soham Poddar, Rajdeep Mukherjee, Kripabandhu Ghosh, Pawan Goyal, Saptarshi Ghosh

Accepted at The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP), 2022
License: CC BY 4.0

Abstract: Summarization of legal case judgement documents is a challenging problem in Legal NLP. However, not much analyses exist on how different families of summarization models (e.g., extractive vs. abstractive) perform when applied to legal case documents. This question is particularly important since many recent transformer-based abstractive summarization models have restrictions on the number of input tokens, and legal documents are known to be very long. Also, it is an open question on how best to evaluate legal case document summarization systems. In this paper, we carry out extensive experiments with several extractive and abstractive summarization methods (both supervised and unsupervised) over three legal summarization datasets that we have developed. Our analyses, that includes evaluation by law practitioners, lead to several interesting insights on legal summarization in specific and long document summarization in general.

Submitted to arXiv on 14 Oct. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2210.07544v1

, , , , In the field of Legal NLP, summarizing legal case judgement documents poses a significant challenge. There is a lack of analysis on how different types of summarization models, such as extractive and abstractive, perform when applied to legal case documents. This issue is particularly crucial due to the limitations of recent transformer-based abstractive summarization models in handling the lengthy nature of legal documents. Additionally, there is a need to determine the most effective way to evaluate legal case document summarization systems. To address these challenges, extensive experiments were conducted using various extractive and abstractive summarization methods, both supervised and unsupervised, across three legal summarization datasets. The study included evaluation by law practitioners and yielded valuable insights into legal summarization practices and long document summarization techniques in general. The research explored a range of summarization methods, including unsupervised extractive approaches like LexRank, DSDR, and PacSum, supervised extractive methods such as SummaRunner and BERT-SUMM, as well as supervised abstractive models like BART and Longformer. Surprisingly, general domain-agnostic methods often outperformed domain-specific approaches in legal document summarization tasks. Furthermore, the study highlighted the benefits of domain-specific training and fine-tuning using pre-trained models like Legal-Pegasus for improved performance. Various strategies for generating legal data for training supervised models were compared to enhance model effectiveness. One key challenge addressed was how to handle long legal documents with existing abstractive summarizers that have limited input capacity. Three approaches were tested: utilizing long document summarizers like Longformer designed for lengthy texts; employing short document summarizers like BART along with chunking techniques; and combining extractive and abstractive methods for efficient summary generation. The chunking-based approach showed promising results for legal documents with fine-tuning proving beneficial. Additionally, the evaluation methodology emphasized not only assessing full-document summaries but also evaluating how well summaries represented different logical segments within a legal case document (e.g., Facts, Final Judgment). Document-wide automatic evaluations alongside segment-wise assessments were conducted alongside evaluations by law practitioners to ensure comprehensive analysis of summary quality. Overall, this comprehensive study sheds light on effective strategies for legal document summarization while providing valuable insights applicable to long document summarization tasks in diverse domains.
Created on 16 Jun. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.