CORE-KG: An LLM-Driven Knowledge Graph Construction Framework for Human Smuggling Networks

AI-generated keywords: Human smuggling networks Legal case documents Automated knowledge graph construction CORE-KG framework Criminal network analysis

AI-generated Key Points

  • Human smuggling networks are complex and constantly evolving
  • Legal case documents contain valuable insights but are often unstructured, dense with legal jargon, and filled with ambiguous references
  • Existing methods for building knowledge graphs lack coreference resolution and often result in noisy or fragmented graphs
  • A modular framework called CORE-KG has been proposed to address these issues
  • CORE-KG utilizes a two-step pipeline: type-aware coreference resolution through structured prompts and entity/relationship extraction guided by domain-specific instructions
  • Targeted preprocessing is applied to extract only the "Opinion" section from each case file to reduce noise and redundancy in constructing knowledge graphs
  • In preliminary experiments using a sample of 20 cases, the system demonstrated its capacity to extract key entities and identify meaningful relationships
  • The implementation of LLaMA 3.3 70B model for coreference resolution and KG construction ensures reproducibility and minimizes stochastic variation during inference
  • The system operates within an open-source environment without relying on commercial APIs, enhancing accessibility and transparency
  • Baseline evaluation against GraphRAG framework highlights the effectiveness of CORE-KG in reducing node duplication by 33.28% and legal noise by 38.37%
  • By focusing on extracting relevant information from the Opinion section of legal documents through coreference resolution and structured entity/relationship extraction processes, CORE-KG offers a comprehensive approach to constructing interpretable KGs from complex narratives found in criminal cases involving human smuggling activities
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Dipak Meher, Carlotta Domeniconi, Guadalupe Correa-Cabrera

License: CC BY 4.0

Abstract: Human smuggling networks are increasingly adaptive and difficult to analyze. Legal case documents offer valuable insights but are unstructured, lexically dense, and filled with ambiguous or shifting references-posing challenges for automated knowledge graph (KG) construction. Existing KG methods often rely on static templates and lack coreference resolution, while recent LLM-based approaches frequently produce noisy, fragmented graphs due to hallucinations, and duplicate nodes caused by a lack of guided extraction. We propose CORE-KG, a modular framework for building interpretable KGs from legal texts. It uses a two-step pipeline: (1) type-aware coreference resolution via sequential, structured LLM prompts, and (2) entity and relationship extraction using domain-guided instructions, built on an adapted GraphRAG framework. CORE-KG reduces node duplication by 33.28%, and legal noise by 38.37% compared to a GraphRAG-based baseline-resulting in cleaner and more coherent graph structures. These improvements make CORE-KG a strong foundation for analyzing complex criminal networks.

Submitted to arXiv on 20 Jun. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2506.21607v1

Human smuggling networks are complex and constantly evolving, making them challenging to analyze. Legal case documents contain valuable insights but are often unstructured, dense with legal jargon, and filled with ambiguous references, posing obstacles for automated knowledge graph (KG) construction. Existing methods for building KGs lack coreference resolution and often result in noisy or fragmented graphs. To address these issues, a modular framework called CORE-KG has been proposed. This framework utilizes a two-step pipeline: type-aware coreference resolution through structured prompts and entity/relationship extraction guided by domain-specific instructions. To reduce noise and redundancy in constructing KGs that capture the structure of criminal networks from legal texts, targeted preprocessing is applied to extract only the "Opinion" section from each case file. This section contains the main factual narrative detailing individuals involved, routes used, items transported, and events sequence relevant to building meaningful KGs. In preliminary experiments using a sample of 20 cases retrieved from the Nexis Uni database, the system demonstrated its capacity to extract key entities and identify meaningful relationships. The implementation of LLaMA 3.3 70B model for coreference resolution and KG construction ensures reproducibility and minimizes stochastic variation during inference. The system operates within an open-source environment without relying on commercial APIs, enhancing accessibility and transparency. Baseline evaluation against GraphRAG framework highlights the effectiveness of CORE-KG in reducing node duplication by 33.28% and legal noise by 38.37%, resulting in cleaner and more coherent graph structures suitable for analyzing criminal networks like human smuggling. By focusing on extracting relevant information from the Opinion section of legal documents through coreference resolution and structured entity/relationship extraction processes, CORE-KG offers a comprehensive approach to constructing interpretable KGs from complex narratives found in criminal cases involving human smuggling activities.
Created on 08 Jul. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.