Write up on Tech Geek History : DataLog & Logical Programming

Significance of the Study

1.1 Logical Programming

A logical operator is said to be truth-functional if the truth-values (the truth or falsity, etc.) of the statements it is used to construct always depend entirely on the truth or falsity of the statements from which they are constructed. The English words “and”, “or” and “not” are (at least arguably) truth-functional, because a compound statement joined together with the word “and” is true if both the statements so joined are true, and false if either or both are false, a compound statement joined together with the word “or” is true if at least one of the joined statements is true, and false if both joined statements are false, and the negation of a statement is true if and only if the statement negated is false.

Some logical operators are not truth-functional. One example of an operator in English that is not truth-functional is the word “necessarily”. Whether a statement formed using this operator is true or false does not depend entirely on the truth or falsity of the statement to which the operator is applied. For example, both of the following statements are true:

2 + 2 = 4.
Conjunction: The conjunction of two statements α and β, written in PL as ┌(α∧β)┐, is true if both α and β are true, and is false if either α is false or β is false or both are false. In effect, the meaning of the operator ‘∧‘ can be displayed according to the following chart, which shows the truth-value of the conjunction depending on the four possibilities of the truth-values of the parts:

α	β	(α∧β)
T T F F	T F T F	T F F F

The basic rules and principles of classical truth-functional propositional logic are, among contemporary logicians, almost entirely agreed upon, and capable of being stated in a definitive way. This is most easily done if we utilize a simplified logical language that deals only with simple statements considered as indivisible units as well as complex statements joined together by means of truth-functional connectives. We first consider a language called PL for “Propositional Logic

The validity of this argument can be made more obvious by representing the chain of reasoning leading from the premises to the conclusion:

Either cat fur was found at the scene of the crime, or dog fur was found at the scene of the crime. (Premise)
If dog fur was found at the scene of the crime, then officer Thompson had an allergy attack. (Premise)
If cat fur was found at the scene of the crime, then Macavity is responsible for the crime. (Premise)
Officer Thompson did not have an allergy attack. (Premise)
Dog fur was not found at the scene of the crime. (Follows from 2 and 4.)
Cat fur was found at the scene of the crime. (Follows from 1 and 5.)
Macavity is responsible for the crime. (Conclusion. Follows from 3 and 6.)

1.2 Datalog

Datalog, a logic programming language defined in the early 1980s, was originally used to describe and implement deductive databases [GMN84]. Given a set B representing the data in the database, a Datalog program P specified recursive views that a user could then query. Logically, B is a set of ground atomic formulas (facts) and P a set of implicational formulas of a restricted form (Horn clauses).

The semantics of such a Datalog system, as well as some actual implementations [CGT89], was given by the set of the atomic logical consequences of P with respect to B, in symbols {p(~t) | P, B ` p(~t)} — a beautifully pure use of logic. In the last ten years, Datalog has had a resurgence in a variety of domains other than databases: to provide user authentication [LM03], to implement network protocols [GW10, LCG+06], to program cyberphysical systems [ARLG+09] and to support deductive spreadsheets [Cer07], just to cite a few. In these applications, the contents of the fact base B is highly dynamic, at least compared to the relatively static databases of the 1980s.

This requires efficient algorithms to assert new facts and to retract stale facts. While such algorithms have been proposed [ARLG+09, CARG+12, GMS93, LCG+06, NJLS11], the connection with logic has lagged behind, especially for retraction. Indeed, although it is easy to give a logical specification of assertion and retraction, as done in Figure 1, a direct implementation of these specifications is horribly inefficient as it involves recomputing logical consequences from scratch after each assertion or retraction. In this report, we propose a logical interpretation of Datalog that supports updates internally.

Like with the traditional reading, the semantics of a Datalog system (P, B) will be the set {p(~t) | P, B ` p(~t)} of its logical consequences (plus some bookkeeping information). However, the assertion or retraction of a fact a will trigger a small number of logical inferences to determine the new set of logical consequences rather than performing a full recalculation. In this way, the proposed encoding not only constitutes a logical specification of the static and dynamic aspects of a Datalog system, but it can also be viewed as an efficient algorithm to compute updates. Because retraction results in the removal of inferred facts, we base our encoding on a fragment of first-order linear logic rather than on the traditional predicate calculus. The main contributions of this report are therefore

1) the definition of an encoding of Datalog which models the assertion and retraction of facts within logic, and

2) a proof of its correctness with respect to the traditional interpretation.

Proof Theory Another approach to find whether a logically follows from P is to apply a sequence of logically sound inference steps starting from P and ending with a.

The sequence of steps from P to a is called a proof, and serves to prove that a can be logically derived from

P. P ⊢ a is the fancy way to write this.

This states that there exists a proof of a from P. There are a number of questions to address regarding our proof system. First, what are the types of inference steps that are permitted? Once we have established that, we must address soundness and completeness.

• Soundness:

For any P and a, if P ⊢ a, then P |= a.

• Completeness: For any P and a, if P |= a, then P ⊢ a

First-order propositional logic is sound and complete. That is, anything provable in it is in fact true, and anything true with respect to it is provable. It is one of the greatest mathematical results of the 20th century that first-order logic (predicate calculus) without arithmetic is sound and complete. (G¨odel’s Completeness Theorem) It is arguably the greatest mathematical result of the 20th century that first-order logic (predicate calculus) with full arithmetic and second-order logic are necessarily incomplete. (G¨odel’s Incompleteness Theorem)

1.3 Logic as a Database? Problems?

Horn Programs What are the problems with using CNF propositional logic for our knowledge-bases / databases / programs?

1. Looks really unnatural.

2. Allows for meaningless databases / programs (that is, that have no models).

3. Where’s the data?! To address both points 1 and 2, we shall restrict the types of clauses permitted

Logic as a Database! Datalog We call a program that consists of just rules and facts—Horn clauses each with exactly one positive proposition—a Datalog database. We write queries as Horn clauses that contain no positive proposition. A query can be evaluated against a Datalog database as a resolution refutation proof.

Horn clause: At most one positive proposition appears. Horn clauses are more natural, and have a correspondence to database concepts.

Rule / View: a ← b, c. (a ∨ ¬b ∨ ¬c)

Fact: b. (b)

Query: ← a. (¬a)

• Easy to read when rewrittin as implications.

• Every database / program is meaningful; that is, consistent! (Really?! . . .)

Logic as a Database! Datalog

We call a program that consists of just rules and facts—Horn clauses each with exactly one positive proposition—a Datalog database.

We write queries as Horn clauses that contain no positive proposition. A query can be evaluated against a Datalog database as a resolution refutation proof.

query evalutation ≡ proof

So how do we know an answer to a query (with respect to the database) is correct?

Its evaluation is equivalent to a proof that it is correct (that it is a logical consequence)! We still need to address point 3, “Where’s the data?!”

We shall need to use (first-order) predicate calculus logic instead of just (first-order) propositional logic.

How do we know? Consider the interpretation in which we assign true to every proposition. Next, consider each clause: There is exactly one positive proposition per clause in a Datalog database, by definition.

Thus every clause is true with respect to the all-true interpretation, and thus the all-true interpretation is a model. Of course the all-true model is not so interesting. . .but it does guarantee that any datalog database is consistent

1.3a Reasoning about Queries & Databases

Datalog and its logical foundations—model and proof theories—provide us with tools to address other general questions about queries and databases.

• Given two Datalog queries, are they equivalent with respect to the database? That is, must they evaluate to the same answers?

• Does there exist a query of a given question we have in mind? That is, is the question even askable in Datalog? With this database?

• Given to Datalog databases, do they represent the same data? Is any question possible to state for one of them also possible to state for the other one?

The Move from Propositional Logic to Predicate Calculus

• Add arguments to propositions. Now call them predicates.

• Add logical variables. (For use in rules and in queries.)

• Add quantifiers for the variables: ∀ and ∃.

E.g., grandmother (GM, X) ← moth

Datalog permits only universal-quantified clauses. Thus no explicit existential-quantification is allowed.er (GM, P), parent (P, X).

Datalog Database versus “Prolog” Program We generally call a Horn-clause predicate calculus “theory” that we have written down a logic program. The Prolog programming language’s syntax looks just like this. So when do we call it a Datalog database instead? If it uses logical function symbols, it is considered a program. If it does not, it is considered a database. This is the logical distinction between them.

Logical function symbols?? This is essentially a data-structure, such as a list or record, that we could use as an argument to a predicate instead of just a simple value.

• grandmothers ([lallage, ruby, sally ] , parke)

• product (#13, widget (a, b) , $23.50) Function symbols are needed for arithmetic. (We usually add a limited form of arithmetic to a fuller Datalog.)

Here’s an example of a simple Datalog program that defines the ancestor relation recursively:

Figure 1. Grandparent Table

parent(john, douglas)

zero-arity-literal

“=”(3,3)

“”(-0-0-0,&&&,***,”\00”)

A clause is a head literal followed by an optional body. A body is a comma separated list of literals. A clause without a body is called a fact, and a rule when it has one. The punctuation :- separates the head of a rule from its body. A clause is safe if every variable in its head occurs in some literal in its body. The following are safe clauses:

Figure 2. Parent Table

parent(john, douglas)

ancestor(A, B) :-

parent(A, B)

ancestor(A, B) :-

parent(A, C),

ancestor(C, B)

GrandParent Consider the kinship dataset shown below. The situation here is the same as that described in Chapter 2. Art is the parent of Bob and Bea. Bob is the parent of Cal and Cam. Bea is the parent of Cat and Coe. parent(art,bob) parent(art,bea) parent(bob,cal) parent(bob,cam) parent(bea,cat) parent(bea,coe) Suppose now that we wanted to express information about the grandparent relation as well as the parent relation. As illustrated in the preceding chapter, we can do this by adding facts to our dataset. In this case, we would add the facts shown below. Art is the grandparent of Cal and Cam and Cat and Coe. grandparent(art,cal) grandparent(art,cam) grandparent(art,cat) grandparent(art,coe) Unfortunately, doing things this way is wasteful. The grandparent relation can be defined in terms of the parent relation, and so storing grandparent data as well as parent data is redundant. A better alternative is to write rules to encode such definitions and to use these rules to compute the relations defined by these rules when needed. For example, in the case above, rather than adding grandparent facts to our dataset, we can write the following rule, safe in the knowledge that we can use the rule to compute our grandparent data. grandparent(X,Z) :- parent(X,Y) & parent(Y,Z) In what follows, we distinguish two different types of relations – base relations and view relations. We encode information about base relations by writing facts in a dataset, and we define view relations by writing rules in a ruleset. In our example, parent is a base relation, and grandparent is a view relation. Given a dataset defining our base relations and a ruleset defining our view relations, we can use automated reasoning tools to derive facts about our view relations. For example, given the preceding facts about the parent relation and our rule defining the grandparent relation, we can compute the facts about the grandparent relation. Using rules to define view relations has multiple advantages over encoding those relations in the form of datasets. First of all, as we have just seen, there is economy: if view relations are defined in terms of rules, we do not need to store as many facts in our datasets. Second, there is less chance of things getting out of sync, e.g. if we change the parent relation and forget to change the grandparent relation. Third, view definitions work for any number of objects; they even work for applications with infinitely many objects (e.g. the integers) without requiring infinite storage.

1.4 How Does Datalog Work

At its core, Datalog is a set of rules that define relationships between objects in the form of predicates. These predicates can then be used to perform queries and manipulate data. Predicates are represented as facts, which are simply statements that define the relationship between entities.

For example, let’s say we have a database of people and their relationships to one another. We can define predicates for names, genders, parent-child relationships, and so on. These predicates can then be used to perform queries such as “who are the parents of Joe?” or “who are Joe’s grandparents?”.

Datalog rules define new relationships between existing entities. For example, we can define a rule that says “if X is a parent of Y and Y is a parent of Z, then X is a grandparent of Z”. This rule can then be used to derive new insights from the existing database, such as finding all of the grandparents of Joe.

1.5 Modern Applications of Datalog in Database Management Systems

Datalog is becoming increasingly popular in modern database management systems, especially those that deal with large volumes of data. Its versatility, performance, and scalability make it the perfect tool for managing complex databases and performing complex queries.

One of the most powerful applications of Datalog is in graph database management systems. Graph databases are designed to handle complex, interconnected data and relationships, which make them ideal for applications such as social networking, recommendation engines, and fraud detection.

Datalog is particularly well-suited for working with graph databases because it allows you to express complex relationships between entities in a simple and intuitive way. You can define rules that traverse the graph and derive new relationships between entities, allowing you to perform complex queries with ease.

Another popular application of Datalog is in data warehousing. Data warehouses are designed to store large quantities of data and support complex queries that analyze that data. Datalog is perfect for data warehousing because it allows you to define relationships between entities in a way that’s easy to understand and optimize. You can define rules that summarize and aggregate data across multiple dimensions, allowing you to gain unique insights into your data.

Finally, Datalog is also being used in modern machine learning systems. Machine learning algorithms rely heavily on data management and processing, and Datalog’s ability to handle large volumes of data, perform complex queries, and define recursive relationships make it an ideal tool for this application.

Write up on Tech Geek History : DataLog & Logical Programming

Leave a Comment Cancel Reply