The Complexity of Why-Provenance for Datalog Queries

AI-generated keywords: Explainable AI Why-provenance Datalog Data complexity SAT solvers

AI-generated Key Points

  • Explainable AI aims to understand why a database query result is obtained
  • Why-provenance is a standard way to explain a query result by providing information about the witnesses to that result
  • Computational complexity of why-provenance for Datalog queries remains unexplored
  • Why-provenance for recursive queries, even with limited linear recursion, is an intractable problem
  • Non-recursive queries are highly tractable for why-provenance
  • Unambiguous proof trees are introduced to overcome arbitrary proof trees' conceptual limitations and do not affect the data complexity of the why-provenance problem
  • SAT solvers can help compute why-provenance efficiently and make it applicable in practice for (recursive) Datalog queries
  • Future research should provide a complete classification of the data complexity of the why-provenance problem as a dichotomy result and explore combined complexity when Datalog queries are part of input.
  • Thorough experimental evaluations are necessary to better understand our SAT-based machinery's practical application potential.
  • Understanding why a database query result is obtained through explainability techniques like why-provenance is crucial towards achieving Explainable AI goals in ontology-based applications using expressive database query languages like Datalog.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Marco Calautti, Ester Livshits, Andreas Pieris, Markus Schneider

License: CC BY 4.0

Abstract: Explaining why a database query result is obtained is an essential task towards the goal of Explainable AI, especially nowadays where expressive database query languages such as Datalog play a critical role in the development of ontology-based applications. A standard way of explaining a query result is the so-called why-provenance, which essentially provides information about the witnesses to a query result in the form of subsets of the input database that are sufficient to derive that result. To our surprise, despite the fact that the notion of why-provenance for Datalog queries has been around for decades and intensively studied, its computational complexity remains unexplored. The goal of this work is to fill this apparent gap in the why-provenance literature. Towards this end, we pinpoint the data complexity of why-provenance for Datalog queries and key subclasses thereof. The takeaway of our work is that why-provenance for recursive queries, even if the recursion is limited to be linear, is an intractable problem, whereas for non-recursive queries is highly tractable. Having said that, we experimentally confirm, by exploiting SAT solvers, that making why-provenance for (recursive) Datalog queries work in practice is not an unrealistic goal.

Submitted to arXiv on 22 Mar. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2303.12773v1

The goal of Explainable AI is to understand why a database query result is obtained, especially with the increasing use of expressive database query languages like Datalog in ontology-based applications. One standard way to explain a query result is through why-provenance, which provides information about the witnesses to a query result in the form of subsets of the input database that are sufficient to derive that result. Despite decades of intensive study on why-provenance for Datalog queries, its computational complexity remains unexplored. In this work, we aim to fill this gap by pinpointing the data complexity of why-provenance for Datalog queries and key subclasses thereof. We find that why-provenance for recursive queries, even if limited to linear recursion, is an intractable problem. However, for non-recursive queries, it is highly tractable. To overcome arbitrary proof trees' conceptual limitations, we introduce the new class of unambiguous proof trees and show that they do not affect the data complexity of the why-provenance problem. Furthermore, we experimentally confirm that exploiting SAT solvers can help compute why-provenance efficiently and make it applicable in practice for (recursive) Datalog queries. Our approach's end-to-end runtime outperforms other approaches in demanding scenarios while being comparable in simpler ones. Future research should provide a complete classification of the data complexity of the why-provenance problem as a dichotomy result and explore combined complexity when Datalog queries are part of input. Additionally, more thorough experimental evaluations are necessary to better understand our SAT-based machinery's practical application potential. Overall, understanding why a database query result is obtained through explainability techniques like why-provenance is crucial towards achieving Explainable AI goals in ontology-based applications using expressive database query languages like Datalog.
Created on 23 Mar. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.