CORE-KG: An LLM-Driven Knowledge Graph Construction Framework for Human Smuggling Networks

AI-generated keywords: Human smuggling networks Legal case documents Automated knowledge graph construction CORE-KG framework Criminal network analysis

AI-generated Key Points

Human smuggling networks are complex and constantly evolving
Legal case documents contain valuable insights but are often unstructured, dense with legal jargon, and filled with ambiguous references
Existing methods for building knowledge graphs lack coreference resolution and often result in noisy or fragmented graphs
A modular framework called CORE-KG has been proposed to address these issues
CORE-KG utilizes a two-step pipeline: type-aware coreference resolution through structured prompts and entity/relationship extraction guided by domain-specific instructions
Targeted preprocessing is applied to extract only the "Opinion" section from each case file to reduce noise and redundancy in constructing knowledge graphs
In preliminary experiments using a sample of 20 cases, the system demonstrated its capacity to extract key entities and identify meaningful relationships
The implementation of LLaMA 3.3 70B model for coreference resolution and KG construction ensures reproducibility and minimizes stochastic variation during inference
The system operates within an open-source environment without relying on commercial APIs, enhancing accessibility and transparency
Baseline evaluation against GraphRAG framework highlights the effectiveness of CORE-KG in reducing node duplication by 33.28% and legal noise by 38.37%
By focusing on extracting relevant information from the Opinion section of legal documents through coreference resolution and structured entity/relationship extraction processes, CORE-KG offers a comprehensive approach to constructing interpretable KGs from complex narratives found in criminal cases involving human smuggling activities

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Dipak Meher, Carlotta Domeniconi, Guadalupe Correa-Cabrera

arXiv: 2506.21607v1 - DOI (cs.CL)

License: CC BY 4.0

Abstract: Human smuggling networks are increasingly adaptive and difficult to analyze. Legal case documents offer valuable insights but are unstructured, lexically dense, and filled with ambiguous or shifting references-posing challenges for automated knowledge graph (KG) construction. Existing KG methods often rely on static templates and lack coreference resolution, while recent LLM-based approaches frequently produce noisy, fragmented graphs due to hallucinations, and duplicate nodes caused by a lack of guided extraction. We propose CORE-KG, a modular framework for building interpretable KGs from legal texts. It uses a two-step pipeline: (1) type-aware coreference resolution via sequential, structured LLM prompts, and (2) entity and relationship extraction using domain-guided instructions, built on an adapted GraphRAG framework. CORE-KG reduces node duplication by 33.28%, and legal noise by 38.37% compared to a GraphRAG-based baseline-resulting in cleaner and more coherent graph structures. These improvements make CORE-KG a strong foundation for analyzing complex criminal networks.

Submitted to arXiv on 20 Jun. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2506.21607v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

Human smuggling networks are complex and constantly evolving, making them challenging to analyze. Legal case documents contain valuable insights but are often unstructured, dense with legal jargon, and filled with ambiguous references, posing obstacles for automated knowledge graph (KG) construction. Existing methods for building KGs lack coreference resolution and often result in noisy or fragmented graphs. To address these issues, a modular framework called CORE-KG has been proposed. This framework utilizes a two-step pipeline: type-aware coreference resolution through structured prompts and entity/relationship extraction guided by domain-specific instructions. To reduce noise and redundancy in constructing KGs that capture the structure of criminal networks from legal texts, targeted preprocessing is applied to extract only the "Opinion" section from each case file. This section contains the main factual narrative detailing individuals involved, routes used, items transported, and events sequence relevant to building meaningful KGs. In preliminary experiments using a sample of 20 cases retrieved from the Nexis Uni database, the system demonstrated its capacity to extract key entities and identify meaningful relationships. The implementation of LLaMA 3.3 70B model for coreference resolution and KG construction ensures reproducibility and minimizes stochastic variation during inference. The system operates within an open-source environment without relying on commercial APIs, enhancing accessibility and transparency. Baseline evaluation against GraphRAG framework highlights the effectiveness of CORE-KG in reducing node duplication by 33.28% and legal noise by 38.37%, resulting in cleaner and more coherent graph structures suitable for analyzing criminal networks like human smuggling. By focusing on extracting relevant information from the Opinion section of legal documents through coreference resolution and structured entity/relationship extraction processes, CORE-KG offers a comprehensive approach to constructing interpretable KGs from complex narratives found in criminal cases involving human smuggling activities.

- Human smuggling networks are complex and constantly evolving
- Legal case documents contain valuable insights but are often unstructured, dense with legal jargon, and filled with ambiguous references
- Existing methods for building knowledge graphs lack coreference resolution and often result in noisy or fragmented graphs
- A modular framework called CORE-KG has been proposed to address these issues
- CORE-KG utilizes a two-step pipeline: type-aware coreference resolution through structured prompts and entity/relationship extraction guided by domain-specific instructions
- Targeted preprocessing is applied to extract only the "Opinion" section from each case file to reduce noise and redundancy in constructing knowledge graphs
- In preliminary experiments using a sample of 20 cases, the system demonstrated its capacity to extract key entities and identify meaningful relationships
- The implementation of LLaMA 3.3 70B model for coreference resolution and KG construction ensures reproducibility and minimizes stochastic variation during inference
- The system operates within an open-source environment without relying on commercial APIs, enhancing accessibility and transparency
- Baseline evaluation against GraphRAG framework highlights the effectiveness of CORE-KG in reducing node duplication by 33.28% and legal noise by 38.37%
- By focusing on extracting relevant information from the Opinion section of legal documents through coreference resolution and structured entity/relationship extraction processes, CORE-KG offers a comprehensive approach to constructing interpretable KGs from complex narratives found in criminal cases involving human smuggling activities

Summary- People who help others illegally enter a country work together in complicated ways that keep changing. - Important papers in court cases have useful information but can be hard to understand because they use legal words and are not organized well. - The usual methods for making knowledge graphs don't always connect all the right things and can end up messy. - A new plan called CORE-KG has been made to fix these problems. - CORE-KG uses a step-by-step process to figure out what things mean and how they are related. Definitions- Human smuggling networks: Groups of people who help others enter a country illegally. - Legal case documents: Papers with important information about court cases. - Knowledge graphs: Visual representations of how different things are connected or related. - Coreference resolution: Figuring out when different words refer to the same thing. - Entity/relationship extraction: Identifying important things and how they are connected in a document.

Introduction Human smuggling networks are a global phenomenon, with an estimated 5.9 million people being smuggled across international borders each year (UNODC, 2020). These networks are complex and constantly evolving, making them challenging to analyze. However, understanding the structure and operations of these networks is crucial for law enforcement agencies in their efforts to combat human smuggling. One valuable source of information for analyzing human smuggling networks is legal case documents. These documents contain detailed narratives about individuals involved in the network, routes used for transportation, items transported, and events sequence relevant to building meaningful knowledge graphs (KGs). However, these documents are often unstructured and dense with legal jargon, making it difficult for automated systems to extract relevant information. To address this issue, researchers have proposed a modular framework called CORE-KG (COREference Knowledge Graph) that utilizes a two-step pipeline: type-aware coreference resolution through structured prompts and entity/relationship extraction guided by domain-specific instructions. This article will provide a detailed overview of this research paper titled "CORE-KG: A Modular Framework for Constructing Interpretable Knowledge Graphs from Legal Case Documents on Human Smuggling Networks." The Need for CORE-KG Existing methods for constructing KGs lack coreference resolution capabilities and often result in noisy or fragmented graphs. Coreference resolution is the process of identifying all expressions in text that refer to the same real-world entity. In the context of human smuggling networks analysis, this means identifying all references to individuals involved in the network throughout the legal case document. Without proper coreference resolution techniques, automated systems may mistakenly identify different references as separate entities or merge unrelated references into one entity. This can lead to noisy and inaccurate KGs that hinder effective analysis. Furthermore, traditional methods do not take into account the specific domain knowledge required for extracting relevant information from legal case documents on human smuggling activities. This results in incomplete or irrelevant data being included in the KG, making it difficult to draw meaningful insights. The CORE-KG Framework To address these issues, the researchers proposed a modular framework called CORE-KG. This framework consists of two main components: type-aware coreference resolution and structured entity/relationship extraction. Type-Aware Coreference Resolution The first step in constructing an interpretable KG is to identify all references to entities involved in the human smuggling network. To achieve this, CORE-KG utilizes a type-aware coreference resolution approach that takes into account the specific types of entities relevant to human smuggling networks. This is achieved through structured prompts that guide the system in identifying references to individuals involved in the network. These prompts are designed based on domain-specific knowledge and can be easily modified or extended as needed for different types of legal case documents. Structured Entity/Relationship Extraction Once all references have been identified through coreference resolution, the next step is to extract relevant information about these entities and their relationships from the legal case document. This process is guided by domain-specific instructions that specify which types of entities and relationships are relevant for building a meaningful KG for human smuggling networks analysis. Targeted Preprocessing To reduce noise and redundancy in constructing KGs from legal texts, targeted preprocessing is applied before extracting information from each case file. Specifically, only the "Opinion" section of each case file is extracted as it contains the main factual narrative detailing individuals involved, routes used, items transported, and events sequence relevant to building meaningful KGs. Preliminary Experiments and Results In preliminary experiments using a sample of 20 cases retrieved from the Nexis Uni database, CORE-KG demonstrated its capacity to extract key entities and identify meaningful relationships. The implementation of LLaMA 3.3 70B model for coreference resolution and KG construction ensures reproducibility and minimizes stochastic variation during inference. Baseline evaluation against GraphRAG (Graph-based Reference-Aware Graph) framework highlights the effectiveness of CORE-KG in reducing node duplication by 33.28% and legal noise by 38.37%, resulting in cleaner and more coherent graph structures suitable for analyzing criminal networks like human smuggling. Conclusion In conclusion, the CORE-KG framework offers a comprehensive approach to constructing interpretable KGs from complex narratives found in criminal cases involving human smuggling activities. By focusing on extracting relevant information from the Opinion section of legal documents through coreference resolution and structured entity/relationship extraction processes, CORE-KG addresses key challenges faced by traditional methods and provides a valuable tool for law enforcement agencies in their efforts to combat human smuggling networks. Its open-source implementation also enhances accessibility and transparency, making it a promising solution for future research in this field.

Created on 08 Jul. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

57.6%

SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-A…

cs.CL

56.9%

Edge: Enriching Knowledge Graph Embeddings with External Text

cs.CL

54.8%

GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning

cs.CL

54.1%

A Survey of Large Language Models on Generative Graph Analytics: Query, Learn…

cs.CL

54.0%

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

cs.CL

53.5%

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domai…

cs.CL

53.5%

A Survey on Large Language Models with some Insights on their Capabilities an…

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.