An inference engine for RDF

 

Master thesis  by G.Naudts

 

 

 

Open University Netherlands

Agfa Gevaert

 

 

 

 

 

                

 

 


This document is the Master Thesis made as a part of my Master Degree in Computer Science Education (Software Systems) at the Open University of the Netherlands. The work has been done in collaboration with the research department of the company Agfa in Mortsel Belgium.

 

Student data

 

Name                          Guido Naudts

Student number           831708907

Address                       Secretarisdreef 5

                                    2288 Bouwel

Telephone work          0030-2-542.76.01

                  Home        0030-14-51.32.43

E-mail                         Naudts_Vannoten@yahoo.com

 

Coaching and graduation committee

 

Chairman:                    dr J.T. Jeuring, professor at the Open University

Secretary :                    ir. F.J. Wester, senior lecturer at the Open University

Coach:                          ir. J. De Roo, researcher at Agfa Gevaert.

 

 

 

 


With thanks to all following persons:

Pr.J.Jeuring, Ir. F.Wester, Ir.J. De Roo

M.P.Jones  

All the people from the OU for their efforts

My wife and children

 


Contents

 

Summary

Chapter 1.Introduction

1.1.          The Semantic Web

Chapter 2. The building blocks

2.1.          XML and namespaces

2.2.          URI’s and URL’s

2.3.          Resource Description Framework RDF

2.4.          Notation 3

          2.5.    Semantics of Notation 3

          2.6.    The logic layer

Chapter 3. A global view on the Semantic Web

          3.1.    The layers of the Semantic Web

3.2.          Characteristics of an inference engine

Chapter 4. Description of the research question.

4.1.           The problem

          4.2.    The solution framework

Chapter 5. Inferencing and the graph model of RDF

5.1.          Introduction

5.2.          Matching two triplesets

5.3.          The model interpretation of a rule

5.4.          Matching a tripleset with a rule

5.5.          Unification

5.6.          The resolution process

5.7.          Structure of the engine

5.8.          The backtracking resolution mechanism in Haskell

5.9.          The closure path

5.10.      The resolution path

5.11.      Variable renaming

5.12.      Comparison of resolution and closure paths

5.13.      Anti-looping technique

5.14.      A formal theory of graph resolution

5.15.      An extension of the theory to variable arities

5.16.      An extension of the theory to typed nodes and arcs

Chapter 6. RDF, inferencing and first order logic.

6.1.          The model mismatch

6.2.          Modelling RDF in FOL

6.3.          Unification and the RDF graph model

6.4.          Embedded rules

6.5.          Completeness and soundness of the engine

6.6.          RDFProlog

6.7.          Comparison with PROLOG and Otter

6.8.          Differences between RDF and FOL

 

Chapter 7.  Existing inference engines

         7.1. Introduction

         7.2. Euler

         7.3. CWM

         7.4.

         7.5. Conclusion

Chapter 7. Optimization.

7.1.          Introduction

7.2.          Discussion

Chapter 8. Inconsistencies.

Chapter 9. Applications.

9.1.          An Alpine club

9.2.          Simulation of logic gates

9.3.          A description of owls

Conclusion

Bibliography

List of abbreviations

Annexe 1. Use of the inference engine N3Engine.hs

Annexe 2. An example of a resolution and a closure path made with the gedcom example


Summary

 

The Semantic Web is an initiative of the World Wide Web Consortium (W3C). The goal of this project is to define a series of standards. These standards will create a framework for the automation of activities on the World Wide Web (WWW) that still need manual intervention by human beings at this moment.

A certain number of basic standards have been developed already. Among them XML is without doubt the most known. Less known is the Resource Description Framework (RDF). This is a standard for the creation of meta information. This is the information that has to be given to the machines to permit them to handle the tasks previously done by human beings.

Information alone however is not sufficient. In order to take a decision it is often necessary to follow a certain reasoning. Reasoning is connected with logics. The consequence is that the machines have to be fed with rules and processes (read: programs) that can take, following a certain logic, decisiàns based on available information.

These processes and reasonings are excuted by inference engines. The definition of inference engines for the Semantic Web is at the moment an active field of experimental research.

These engines have to use the information that is expressed following the standard RDF. This means that they have to follow the data model of this standard. The consequences of this requirement for the structure of the engine are explored in this thesis.

An inference engine uses information and, given a query, reasons with this information with a solution as a result. It is important to be able to show that the solutions, obtained in this manner, have certain characteristics. These are, bteween others, the characteristics completeness and validity. This proof is delivered in this thesis in two ways:

1)     by means of a reasoning based on the graph theoretical properties of RDF

2)     by means of a model based on First Order Logics.

On the other hand these engines must take into account the specificity of the World Wide Web. One of the properties of the web is the fact that sets are not closed. This implies that not all elements of a set are known. Reasoning then has to happen in an open world. This fact has far reaching logical implications.

During the twentieth century there has been een impetous development of logic systems. More than 2000 kinds of logic were developed. Notably First Order Logic (FOL) has been under heavy attack, beside others, by the dutch Brouwer with the constructive or intuistionistic logic. Research is also done into kinds of logic that are suitable for usage by machines.

In this thesis I argue that an inference engine for the World Wide Web should follow a kind of constructive logic and that RDF, with a logical implication added, satisfies this requirement.

The structure of the World Wide Web has a number of consequences for the properties that an inference engine should satisfy. A list of these properties is established and discussed.

An important characteristic for an engine that has to be active in the World Wide Web is efficiency. Therefore it is important to do research about the methods that can be used to make an efficient engine. The structure has to be such that it is possible to work with huge volumes of data. Eventually these data are kept in relational databases.

On the World Wied Web there is information available in a lot of places and in a lot of different forms. Joining parts of this information and reasoning about the result can easily lead to the existence of contradictions and inconsistencies. These are inherent to the used logic or are dependable on the application. A special place is taken by inconsistencies that are inherent to the used ontology. For the Semantic Web the ontology is determined by the standards rdfs and OWL.

An ontology introduces a classification of data and apllies restrictions to those data. An inference engine for the Semantic Web needs to have such characteristics that it is compatible with rdfs and OWL.

An executable specification of an inference engine in Haskell was constructed. This permitted to test different aspects of inferencing and the logic connected to it. This engine has the name N3Engine. A large number of existing testcases was executed with the engine. A certain number of testcases was constructed, some of them for inspecting the logic characteristics of RDF.

 


Samenvatting

 

Het Semantisch Web is een initiatief van het World Wide Web Consortium (W3C). Dit project beoogt het uitbouwen van een reeks van standaarden. Deze standaarden zullen een framework scheppen voor de automatisering van activiteiten op het World Wide Web (WWW) die thans nog manuele tussenkomst vereisen.

Een bepaald aantal basisstandaarden zijn reeds ontwikkeld waaronder XML ongetwijfeld de meest gekende is. Iets minder bekend is het Resource Description Framework (RDF). Dit is een standaard voor het creëeren van meta-informatie. Dit is de informatie die aan de machines moet verschaft worden om hen toe te laten de taken van de mens over te nemen.

Informatie alleen is echter niet voldoende. Om een beslissing te kunnen nemen moet dikwijls een bepaalde redenering gevolgd worden. Redeneren heeft te maken met logica. Het gevolg is dat de machines gevoed moeten worden met regels en processen (lees: programma’s) die, volgens een bepaalde logica, besluiten kunnen nemen aan de hand van beschikbare informatie.

Deze processen en redeneringen worden uitgevoerd door zogenaamde inference engines. Dit is op het huidig ogenblik een actief terrein van experimenteel onderzoek.

De engines moeten gebruik maken van de informatie die volgens de standaard RDF wordt weergegeven. Zij dienen aldus het data model van deze standaard te volgen. De gevolgen hiervan voor de structuur van de engine worden in deze thesis nader onderzocht.

Een inference engine gebruikt informatie en, aan de hand van een vraagstelling, worden redeneringen doorgevoerd op deze informatie met een oplossing als resultaat. Het is belangrijk te kunnen aantonen dat de oplossingen die aldus bekomen worden bepaalde eigenschappen bezitten. Dit zijn onder meer de eigenschappen volledigheid en juistheid. Dit bewijs wordt op twee manieren geleverd: enerzijds bij middel van een redenering gebaseerd op de graaf theoretische eigenschappen van RDF; anders bij middel van een op First Order Logica gebaseerd model.

Anderzijds dienen zij rekening te houden met de specificiteit van het World Wide Web. Een van de eigenschappen van het web is het open zijn van verzamelingen. Dit houdt in dat van een verzameling niet alle elementen gekend zijn. Het redeneren dient dan te gebeuren in een open wereld. Dit heeft verstrekkende logische implicaties.

In de loop van de twintigste eeuw heeft de logica een stormachtige ontwikkeling gekend. Meer dan 2000 soorten logica werden ontwikkeld. Aanvallen werden doorgevoerd op het bolwerk van de First Order Logica (FOL), onder meer door de nederlander Brouwer met de constructieve of  intuistionistische  logica. Gezocht wordt ook naar soorten logica die geschikt zijn voor het gebruik door machines.

In deze thesis wordt betoogd dat een inference engine voor het WWW een constructieve logica dient te volgen enerzijds, en anderzijds, dat RDF aangevuld met een logische implicatie aan dit soort logica voldoet.

De structuur van het World Wide Web heeft een aantal gevolgen voor de eigenschappen waaraan een inference engine dient te voldoen. Een lijst van deze eigenschappen wordt opgesteld en besproken.

Een belangrijke eigenschap voor een engine die actief dient te zijn in het World Wide Web is efficientie. Het is daarom belangrijk te onderzoeken op welke wijze een efficiente engine kan gemaakt worden. De structuur dient zodanig te zijn dat het mogelijk is om met grote volumes aan data te werken, die evntueel in relationele databases opgeslagen zijn.

Op het World Wide Web is informatie op vele plaatsen en in allerlei vormen aanwezig. Het samen voegen van deze informatie en het houden van redeneringen die erop betrekking hebben kan gemakkelijk leiden tot het ontstaan van contradicties en inconsistenties. Deze zijn inherent aan de gebruikte logica of zijn afhankelijk van de applicatie. Een speciale plaats nemen de inconsistenties in die inherent zijn aan de gebruikt ontologie. Voor het Semantisch Web wordt de ontologie bepaald door de standaarden rdfs en OWL. Een ontologie voert een classifiering in van gegevens en legt ook beperkingen aan deze gegevens op. Een inference engine voor het Semantisch Web dient zulkdanige eigenschappen te bezitten dat hij compatibel is met rdfs en OWL.

Een uitvoerbare specificatie van een inference engine in Haskell werd gemaakt. Dit liet toe om allerlei aspecten van inferencing en de ermee verbonden logica te testen. De engine werd N3Engine gedoopt. Een groot aantal bestaande testcases werd onderzocht. Een aantal testcases werd bijgemaakt, sommige speciaal met het oogmerk om de logische eigenschappen van RDF te onderzoeken.

        


 

Chapter 1. Introduction

 

1.1.The Semantic Web

 

1.1.1.Scope

 

This thesis will concentrate on RDF and inferencing. Here only an overview of the Semantic Web is given.

By the advent of the internet a mass communication medium has become available. One of the most important and indeed revolutionary characteristics of the internet is the fact that now everybody is connected with everybody, citizens with citizens, companies with companies and citizens with companies, government etc... In fact now the global village exists. This interconnection creates fascinating possibilities of which only few are used today. The internet serves mainly as a vehicle for hypertext. These texts with high semantic value for humans have little semantic value for computers. One of the goals of the Semantic Web is to augment the semantic content in web information that is usable by computers. Important progress can be made by putting semantic information into existing web pages. An example of an effort in that direction is SHOE [HEFLIN].

The semantic information serves two purposes:

1)     augment the accessibility of the information present on the internet.

2)     automate the exchanges between computers that are active on the internet.

 

1.1.2. Automation

 

The problem always with interconnection of companies is the specificity of the tasks to perform. EDI was an effort to define a framework that companies could use to communicate with each other. Other efforts have been done by standardizing XML languages (e.g. ebXML).  At the current moment an effort endorsed by large companies is underway: web services.

The interconnection of all companies and citizens one with another creates the possibility of automating a lot of transactions that are now done manually or via specific and dedicated automated systems. It should be clear that separating the common denominator from all the efforts mentioned higher, standardizing it so that it can be reused a million times certainly has to be interesting. Of course standardisation may not develop into bureaucracy impeding further developments. But e.g. creating 20 times the same program with different programming languages does not seem very interesting either, except if you can leave the work to computers and even then, a good use should be made of computers.

If every time two companies connect to each other for some application they have to develop a framework for that application then the efforts to develop all possible applications become enormous. Instead a general system can be developed based on inference engines and ontologies. The mechanism is as follows: the interaction between the communicating partners to achieve a certain goal is laid down into facts and rules using a common language to describe those facts and rules where the flexibility is provided by the fact that the common language is in fact a series of languages and tools included in the Semantic Web vision: XML, RDF, RDFS, DAML+OIL, SweLL, OWL. To achieve automatic interchange of information ontologies play a crucial role; because a computer is not able to make intelligent guesses to the meaning of something as humans do, the meaning of something (i.e. the semantics) have to be defined in terms of computer actions. A computer agent recieves a communication from another agent. It must then be able to transform that communication into something it understands i.e. that it can interpret and act upon. The word ‘transform’ means that eventually the message may arrive in a different ontology than the one used by the local client but necessarily a transformation to his own ontology must be possible. Eventually an agreement between the two parties for some additional, non-standard ontologies has to be made for a certain application.

It is supposed that the inference engine has enough power to deal with all (practically all) possible situations.

Then their might be the following scheme for an application using the technology discussed and partially implemented within this thesis:

Lay down the rules of the application in Notation 3. One partner then sends a query to another partner. The inference engine interprets this query thereby using its set (sets) of ontological rules and then it produces an answer. The answer indeed might consist of statements that will be used by another software to produce actions within the recieving computer. What has to be done then? Establishing the rules and making an interface that can transform the response of the engine into concrete actions.

The semantics of this all lies in the interpretation by the inference engine of  the ontological rule sets that it disposes of and  their specific implementation by the engine and in the actions performed by the interface as a consequence of the engine’s responses. [USHOLD] Clearly the actions performed after a conclusion from the engine give place to a lot of possible standardisations. (A possible action might be: sending a SOAP message. Another might be: sending a mail).

What will push the Semantic Web are the enormous possibilities of automated interaction between communication partners created by the sole existence of the internet: companies, government, citizens. To say it simply: the whole thing is too interesting not to be done!

The standardisation of the Semantic Web and the development of a standard ontology and proof engines that can be used to establish trust on the web is a huge challenge but the potential rewards are huge too. The computers of companies and citizens will be able to make complex completely automated interactions freeing everybody from administrative and logistic burdens. A lot of software development remains to be done and will be done by software scientists and engineers.

 

The question will inevitable be raised whether this development is for the good or the bad. The hope is that a further, perhaps gigantesque, development of the internet will keep and enhance its potentialities for defending and augmenting human freedom.

 

 

1.2. Case study

 

1.2.1. Introduction

 

A case study can serve as a guidance to what we want to achieve with the Semantic Web. Whenever standards are approved they should be such that important case studies remain possible  to implement.

 

1.2.2. The case study

 

A travel agent in Antwerp has a client who wants to go to St.Tropez in France. There are rather a lot of possibilities for composing such a voyage. The client can take the train to France, or he can take a bus or train to Brussels and then the airplane to Nice in France, or the train to Paris then the airplane or another train to Nice. The travel agent explains the client that there are a lot of possibilities. During his explanation he gets an impression of what the client really wants. Fig.1.2. gives a schematic view of the case study.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Fig.1.2. A Semantic Web case study

 

He agrees with the client about the itinerary: by train from Antwerp to Brussels, by airplane from Brussels to Nice and by train from Nice to St. Tropez. This still leaves room for some alternatives. The client will come back to make a final decision once the travel agent has said him by mail that he has worked out some alternative solutions like price for first class vs second class etc...

Remark that the decision for the itinerary that has been taken is not very well founded; only very crude price comparisons have been done based on some internet sites that the travel agent consulted during his conversation with the client. A very cheap flight from Antwerp to Cannes has escaped the attention of the travel agent.

The travel agent will now further consult the internet sites of the Belgium railways, the Brussels airport and the France railways to get some alternative prices, departure times and total travel times.

 

Now let’s compare this with the hypothetical situation that a full blown Semantic Web should exist. In the computer of the travel agent resides a Semantic Web agent that has at its disposal all the necessary standard tools. The travel agent has a specialised interface to the general Semantic Web agent. He fills in a query in his specialised screen. This query is translated to a standardised query format for the Semantic Web agent. The agent consult his rule database. This database of course contains a lot of rules about travelling as well as facts like e.g. facts about internet sites where information can be obtained. There are a lot of ‘path’ rules: rules for composing an itinerary (for an example of what such rules could look like see: [GRAPH]. The agent contacts different other agents like the agent of the Belgium railways, the agents of the French railways, the agent of the airports of Antwerp, Brussels, Paris, Cannes, Nice etc...

With the information recieved its inference rules about scheduling a trip are consulted. This is all done while the travel agent is chatting with the client to detect his preferences. After some 5 minutes the Semantic Web agent gives the travel agent a list of alternatives for the trip; now the travel agent can immediately discuss this with his client. When a decision has been reached, the travel agent immediately gives his Semantic Web agent the order for making the reservations and ordering the tickets. Now the client only will have to come back once for getting his tickets and not twice. The travel agent not only has been able to propose a cheaper trip as in the case above but has also saved an important amount of his time.

 

1.2.3.Conclusions of the case study

 

That a realisation of such a system is interesting is evident. Clearly, the standard tools do have to be very flexible and powerful to be able to put into rules the reasoning of this case study (path determination, itinerary scheduling). All this rules have then to be made by someone. This can of course be a common effort for a lot of travel agencies.

What exists now? A quick survey learns that there are web portals where a client can make reservations (for hotel rooms). However the portal has to be fed with data by the travel agent. There also exist software that permit the client to manage his travel needs. But all that software has to be fed with information obtained by a variety of means, practically always manually.

 

 


 

Chapter 2. The building blocks

 

2.1. Introduction

 

In this chapter a number of techniques are briefly explained. These techniques are necessary building blocks for the Semantic Web. They are also a necessary basis for inferencing on the web.

Readers who are familiar with some topics can skip the relevant explanations.

 

2.2.XML and namespaces

 

2.2.1. Definition

 

XML (Extensible Markup Language)  is a subset of SGML (Standard General Markup Language). In its original signification a markup language is a language which is intended for adding information (“markup” information) to an existing document.  This information must stay separate from the original hence the presence of separation characters. In SGML and XML ‘tags’ are used.

 

2.2.2. Features

 

There are two kinds of tags: opening and closing tags. The opening tags are keywords enclosed between the signs “<” and “>”. An example: <author>.  A closing tag is practically the same, only the sign “/” is added e.g. </author>. With these elements alone very interesting datastructures can be built. An example of a book description:

 

<book>

   <title>

         The Semantic Web

   </title>

   <author>

          Tim Berners-Lee

   </author>

</book>

 

As can be seen it is quite easy to build hierarchical datastructures with these elements alone. A tag can have content too: in the example the strings “The Semantic Web” and “Tim Berners-Lee” are content. One of the good characteristics of XML is its simplicity and the ease with which parsers and other tools can be built.

The keywords in the tags can have attributes too. The previous example could be written:

 

<book title=”The Semantic Web” author=”Tim Berners-Lee”></book>

 

where attributes are used instead of tags.

An important characteristic of XML is the readability. The aim is to use systems usable by computers but still understandable by humans though this may need some effort by those humans.

Though in the beginning XML was intended to be used as a vehicle of information on the internet it can be very well used in stand-alone applications too e.g. as the internal hierarchical tree-structure of a computer program . A huge advantage of using XML is the fact that it is standardized which means a lot of tools are available and what also means that a lot of people and programs can deal with it.

Very important is the hierarchical nature of XML. Expressing hierarchical data in XML is very easy and natural. This makes it a useful tool wherever hierarchical data are treated , including all applications using trees. XML could be a standard way to work with trees.

 

XML is not a language but a meta-language i.e. a language with as goal to make other languages (“markup” languages).

Everybody can make his own language using XML. A person doing this only has to follow the syntaxis of XML i.e. produce wellformed XML. However more constraints can be added to an XML language by using DTD’s and XML schema, thus producing valid XML documents. A valid XML document is one that is in accord with the constraints of a DTD or XML schema. To summarize: an XML language is a language that follows XML syntax and XML semantics. The XML language can be defined using DTD’s or XML schema.

If everybody creates his own language then the “tower-of-Babylon”-syndrome is looming. How is such a diversity in languages handled? This is done by using namespaces. A namespace is a reference to the definition of an XML language.

Suppose someone has made an XML language about birds. Then he could make the following namespace declaration in XML:

 

<birds:wing xmlns:birds=”http://birdSite.com/birds/”>

 

This statement is referring to the tag “wing” whose description is to be found on the site that is indicated by the namespace declaration xmlns (= XML namespace). Now our hypothetical biologist  might want to use an aspect of the physiology of birds described however in another namespace:

 

<physiology:temperature xmlns:physiology=” http://physiology.com/xml/”>

 

By the semantic definition of XML these two namespaces may be used within the same XML-object.

 

<?xml version=”1.0” ?>

<birds:wing xmlns:birds=”http://birdSite.com/birds/”>

            large

</birds:wing>

<physiology:temperature xmlns:physiology=” http://physiology.com/xml/”>

            43

</physiology:temperature>

 

The version statement refers to the used version of XML (always the same).

XML gives thus the possibility of using more than one language in one object. What can a computer do with this? It can check the well-formedness of the XML-object. Then is a DTD or an XML-schema describing a language is available it can check the validity of the use of this language within the XML object.  It cannot interprete the meaning of this XML-object at least not without extra programming. Someone can write a program (e.g. a veterinary program) that makes an alarm bell sound when the temperature of a certain bird is 45 and research on the site “http://physiology.com/ “ has indicated a temperature of 43 degrees Celsius.

 

2.3.URI’s and URL’s

2.3.1. Definitions

 

URI stands for Uniform Resource Indicator. A URI is anything that indicates unequivocally a resource.

URL stands for Uniform Resource Locator.  This is a subset of URI. A URL indicates the access to a resource. URN refers to a subset of URI and indicates names that must remain unique even when the resource ceases to be available. URN stands for Uniform Resource Name. [ADDRESSING]

In this thesis only URL’s will be used.

 

2.3.2. Features

 

The following example illustrates a URI that are in common use.

[URI]:

 

 www.math.uio.no/faq/compression-faq/part1.html

 

Most people will recognize this as an internet address. It unequivocally identifies a web page. Another example of a URI is a ISBN number that unquivocally identifies a book.

 

The general format of an http URL is:

 

http://<host>:<port>/<path>?<searchpart>.

 

The host is of course the computer that contains the resource; the default port number is normally 80; eventually e.g. for security reasons it might be changed to something else; the path indicates the directory access path. The searchpart serves to pass information to a server e.g. data destinated for CGI-scripts.

When an URL finishes with a slash like http://www.test.org/definitions/  the directory definitions is addressed. This will be the directory defined by adding the standard prefix path e.g. /home/netscape to the directory name: /home/netscape/definitions.The parser can then return e.g. the contents of the directory or a message ‘no access’ or perhaps the contents of a file ‘index.html’ in that directory.

A path might include the sign “#” indicating a named anchor in an html-document. Following is the html definition of a named anchor:

 

<H2><A NAME="semantic">The Semantic Web</A></H2>

 

A named anchor thus indicates a location within a document. The named anchor can be called e.g. by:

 

http://www.test.org/definition/semantic.html#semantic

 

 

2.4.Resource Description Framework RDF

 

RDF is a language. The semantics are defined by [RDF_SEMANTICS]; three syntaxes are known: XML-syntax, Notation 3 and N-triples. N-triples is a subset of Notation 3 and thus of RDF. [RDF Primer] .

Very basically RDF consist of triples: subject - predicate -  object. This simple statement however is not the whole story; nevertheless it is a good point to start.

An example [ALBANY] of a statement is:

“Jan Hanford created the J. S. Bach homepage.”. The J.S. Bach homepage is a resource. This resource has a URI: http://www.jsbach.org/. It has a property: creator with value ‘Jan Hanford’. Figure 2.1. gives a graphical view of this.

Creator>

            </Description>

 

 

          

 

Fig.2.1. An example of a RDF relation.

 

In simplified RDF this is:

 

<RDF>

   <Description about= “http://www.jsbach.org”>

      <Creator>Jan Hanford</Creator>

   </Description>

</RDF>

 

However this is without namespaces meaning that the notions are not well defined. With namespaces added this becomes:

 

<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:dc=”http://purl.org/DC/”>

   <rdf:Description about=”http://www.jsbach.org/”>

      <dc:Creator>Jan Hanford</dc:Creator>

   </rdf:Description>

</rdf:RDF>

 The first namespace xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#“ refers to the document describing the (XML-)syntax of RDF; the second namespace xmlns:dc=”http://purl.org/DC/ refers to the description of the Dublin Core, a basic ontology about authors and publications. This is also an example of two languages that are mixed within an XML-object: the RDF and the Dublin Core language.

There is also an abbreviated RDF syntax. The example above the becomes:

 

<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:dc=”http://purl.org/DC/”>

   <rdf:Description about=”http://www.jsbach.org/” dc:Creator=”Jan Hanford”>

   </rdf:Description>

</rdf:RDF>

 

In the following example is shown that more than one predicate-value pair  can be indicated for a resource.

 

<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”

xmlns:bi=”http://www.birds.org/definitions/”>

   <rdf:Description about= “http://www.birds.org/birds#swallow”>

       <bi:wing>pointed</bi:wing>

       <bi:habitat>forest</bi:habitat>

   </rdf:Description>

</rdf:RDF>

 

or in abbreviated form:

 

<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”

xmlns:bi=”http://www.birds.org/definitions/”>

   <rdf:Description about=”http://www.birds.org/birds#swallow“      bi:wing=”pointed” bi:habitat=“forest”>

   </rdf:Description>

</rdf:RDF>

 

Other abbreviated forms exists but this is out of scope for this thesis.

 

Conclusion

 

What is RDF? It is a language with a simple semantics consisting of triples: subject – predicate – object and some other elements. Several syntaxes exist for RDF: XML, graph, Notation 3. Notwithstanding its simple structure a great deal of information can be expressed with it.

 

2.5.Notation 3

 

2.5.1. Introduction

 

Notation 3 is a syntaxis for RDF developed by Tim Berners-Lee and Dan Connolly and represents a more human usable form of the RDF-syntax with in principle the same semantics [RDF Primer]. Most of the test cases used for testing my engine are written in Notation 3. An alternative and shorter name for Notation 3 is N3. I always use the Notation 3 syntax for RDF.

2.5.2. Basic properties

 

The basic syntactical form in Notation 3 is a triple of the form :

<subject> <verb> <object> where subject, verb and object are atoms. An atom can be either a URI (or a URI abbreviation) or a variable. But some more complex structures are possible and there also is some “ syntactic sugar”. Verb and object  are also called property and value which is anyhow the semantical meaning.

 

In N3 URI’s can be indicated in a variety of different ways. A URI can be indicated by its complete form or by an abbreviation. In what follows the first item gives the complete form; the others are abbreviations.

·        <http://www.w3.org/2000/10/swap/log#log:forAll> : this is the complete form of a URI in Notation 3. This ressembles very much the indication of a tag in an HTML page.

·         xxx: : a label followed by a ‘:’ is a prefix. A prefix is defined in N3 by the @prefix instruction :

@prefix ont: <http://www.daml.org/2001/03/daml-ont#>.

This defines the prefix ont: . After the label follows the URI represented by the prefix.

So  ont:TransitiveProperty is in full form <http://www.daml.org/2001/03/daml-ont#TransitiveProperty> .

·        <#param> : Here only the tag in the HTML page is indicated. To get the complete form the default URL must be added.the complete form is : <URL_of_current_document#param>.

The default URL is the URL of the current document i.e. the document that contains the description in Notation 3.

·        <> or <#>: this indicates the URI of the current document.

·         : : a single double point is by convention referring to the current document. However this is not necessarily so because this meaning has to be declared with a prefix statement :

@prefix  :  <#> .

 

 

Two substantial abbreviations for sets of triples are property lists and object lists. It can happen that a subject recieves a series of qualifications; each qualification with a verb and an object,

e.g. :bird :color :blue; height :high; :presence :rare.

This represents a bird with a blue color, a high height and a rare presence.

These properties are separated by a semicolon.

A verb or property can have several values e.g.

:bird :color :blue, yellow, black.

This means that the bird has three colors. This is called an object list. The two things can be combined :

 

:bird :color :blue, yellow, black ; height :high ; presence :rare.

 

The objects in an objectlist are separated by a comma.

A semantic and syntactic feature are anonymous subjects.  The symbols ‘[‘ and ‘]’ are used for this feature. [:can :swim]. means there exists an anonymous subject x that can swim. The abbreviations for propertylist and objectlist can here be used too :

 

[ :can :swim, :fly ; :color :yellow].

 

Some more syntactic sugar must be mentioned.

 

:lion :characteristics :mammal.

 

can be replaced by:

 

:lion has :characteristics of :mammals.

 

The words “has” and “of” are just eliminated by the parser.

 

:lion :characteristics :mammals.

 

can be replaced by:

 

:mammals is :characteristics of :lion.

 

Again the words “is” and “of” are just eliminated; however in this case subject and object have to be interchanged.

 

The property rdf:type can be abbreviated as “a”:

 

:lion a :mammal.

 

really means:

 

:lion rdf:type :mammal.

 

The property owl:equivalentTo can be abbreviated as “=”, e.g.

 

:daml:EquivalentTo = owl:equivalentTo.

 

meaning the semantic equivalence of two notions or things.

 

This notion of equality probably will become very important in future for assuring interoperability between different systems on the internet: if A uses term X meaning the same as term Y used by B, this does not matter if this equivalence can be expressed and found.

 

2.6.The logic layer

 

In [SWAPLOG] an experimental logic layer is defined for the Semantic Web. An overview of the most salient features (my engine only uses log:implies, log:forAll, log:forSome and log:Truth):

 

log:implies : this is the implication.

 

{:rat a :rodentia. :rodentia a :mammal.} log:implies {:rat a :mammal}.

 

log:forAll : the purpose is to indicate universal variables :

 

this log:forAll :a, :b, :c.

 

indicates that :a, :b and :c are universal variables.

The word this stands for the scope enveloping the formula. In the form above this is the whole document. When between bracelets it is the local scope: see [PRIMER]. In this thesis this is not used.

 

log:forSome: does the same for existential variables.

 

this log:forSome :a, :b, :c.

 

log:Truth : states that this is a universal truth. This is not interpreted by my engine.

 

Here follow briefly some other features:

log:falseHood : to indicate that a formula is not true.

log:conjunction : to indicate the conjunction of two formulas.

log:includes : F includes G means G follows from F.

log:notIncludes: F notIncludes G means G does not follow from F.

 

2.7.Semantics of N3

 

2.7.1. Introduction

 

The semantics of N3 are the same as the semantics of RDF [RDFM].

The semantics indicate the correspondence between the syntactic forms of RDF and the entities in the universum. When an information problem is analyzed and the problem is represented with statements in Notation 3, it is the semantic model that  determines the semantic meaning of the Notation 3 statements. This is true for all testcases.

 

2.7.2. The model

 

The vocabulary V of the model is composed of a set of URI’s.

LV is the set of literal values and XL is the mapping from the literals to LV.

A simple interpretation I of a vocabulary V is defined by:

1. A non-empty set IR of resources, called the domain or universe of I.

2. A mapping IEXT from IR into the powerset of IR x (IR union LV) i.e. the set of sets of pairs <x,y> with x in IR and y in IR or LV. This mapping defines the properties of the RDF triples.

3. A mapping IS from V into IR

IEXT(x) is a set of pairs which identify the arguments for which the property is true, i.e. a binary relational extension, called the extension of x. [RDFMS]

Informally this means that every URI represents a resource which might be a page on the internet but not necessarily: it might as well be a physical object. A property is a relation; this relation is defined by an extension mapping from the property into a set. This set contains pairs where the first element of a pair represents the subject of a triple and the second element of a pair represent the object of a triple. With this system of extension mapping the property can be part of its own extension without causing paradoxes. This is explained in [RDFMS].

2.7.3. Examples

Take the triple:

:bird :color :yellow.

In the set of URI’s there will be things like: :bird, :mammal, :color, :weight, :yellow, :blue etc...These are part of the vocabulary V.

In the set IR of resources will be: #bird, #color etc.. i.e. resources on the internet or elsewhere. #bird might represent e.g. the set of all birds. These are the things we speak about.

There then is a mapping IEXT from #color (resources are abbreviated) to the set {(#bird,#blue),(#bird,#yellow),(#sun,#yellow),...}

and the mapping IS from V to IR:

:bird à #bird, :color à #color, ...

The URI refers to a page on the internet where the domain IR is defined (and thus the semantic interpretation of the URI).

 


Chapter 3. A global view of the Semantic Web

 

 3.1. The layers of the Semantic Web

 

Fig.3.1. illustrates the different parts of the Semantic Web in the vision of Tim Berners-Lee. The notions are explained in an elementary manner here. Later some of them will be treated more in depth.

 

Layer 1. Unicode and URI

 

  At the bottom there is Unicode and URI. Unicode is the Universal code.

 

 

 

 

 

Fig.3.1. The layers of the Semantic Web [Berners-Lee]

 

Unicode codes the characters of all the major languages in use today.[UNICODE]. There are three formats for encoding unicode characters. These formats are convertible one into another.

1)     in UTF-8 character size is variable. Ascii characters remain unchanged when transformed to UTF-8.

2)     In UTF-16 the most frequently used characters use 2 bytes, while others use 4 bytes.

3)     In UTF-32 all characters are encoded in 4 bytes.

 

URI’s are Universal Resource Indicators. With a URI some”thing” is indicated in a unique and universal way. An example is an indication of an e-mail by the concatenation of email address and date and time.

 

Layer 2. XML, namespaces and XML Schema

 

See chapter 2 for a description of XML.

 

Layer 3. RDF en RDF Schema

 

The first two layers consist of basic internet technologies. With layer 3 the Semantic Web begins. RDF has as its main goal the description of data.

RDF stands for Resource Description Framework.

RDF Schema (rdfs) has as a purpose the introduction of some basic ontological notions. An example is the definition of the notion “Class” and “subClassOf”.

 

Layer 4. The ontology layer

 

The definitions of rdfs are not sufficient. A more extensive ontological vocabulary is needed. This is the task of the Web Ontology workgroup of the W3C who has defined already OWL (Ontology Web Language) and OWL Lite (a subset of OWL). 

 

Layer 5. The logic layer

 

In the case study the use of rulesets was mentioned. For expressing rules a logic layer is needed. An experimental logic layer exists [SWAP/CWM]. This layer is treated in depth in this thesis.

 

Layer 6. The proof layer

 

In the vision of Tim Berners-Lee the production of proofs is not part of the Semantic Web. The reason is that the production of proofs is still a very active area of research and it is by no means possible to make a standardisation of this.  A Semantic Web engine should only need to verify proofs. Someone sends to site A a proof that he is authorised to use the site. Then site A must be able to verify that proof. This is done by a suitable inference engine. Three inference engines that use the rules that can be defined with this layer are: CWM [SWAP/CWM] , Euler [DEROO] and N3Engine developed as part of this thesis.

 

Layer 7. The trust layer

 

Without trust the Semantic Web is unthinkable. If company B sends information to company A but there is no way that A can be sure that this information really comes from B or that B can be trusted then there remains nothing else to do but throw away that information. The same is valid for exchange between citizens. The trust has to be provided by a web of trust that is based on cryptographic principles. The cryptography is necessary so that everybody can be sure that his communication partners are who they claim to be and what they send really originates from them. This explains the column “Digital Signature” in fig. 3.1.

The trust policy is laid down in a “facts and rules” database. This database is used by an inference engine like the inference engine I developed. A user defines his policy using a GUI that produces a policy database. A policy rule might be e.g. if the virus checker says ‘OK’ and the format is .exe and it is signed by ‘TrustWorthy’ then accept this input.

The impression might be created by fig. 3.1 that this whole layered building has as purpose to implement trust on the internet. Indeed it is necessary for implementing trust but, once the pyramid of fig. 3.1 comes into existence, on top of it all kind of applications can be built.

 

Layer 8. The applications

 

This layer is not in the figure; it is the application layer that makes use of the technologies of the underlying 7 layers. An example might be two companies A and B exchanging information where A is placing an order with B.

 

3.2.Characteristics of an inference engine

 

Taking into account the description of the goals of the Semantic Web and examples thereof [TBL] [TBL01] it is possible and necessary to deduce a list of characteristics that an inference engine for the Semantic Web should have.

 

1)     Trust: the notion of trust has an important role in the Semantic Web. If exchanges between computers have to be done in an automatic way a computer on one site must be able to trust the computer at the other side. To determine who can be trusted and who not a policy is necessary. When the interchanges become fully automatic those policies will become complicated sets of facts and rules. Therefore the necessity for an inference engine that can make deductions from such a set. When deducing whether a certain source can be trusted or not the property of monotonicity is desirable: the decision on trustworthiness should not be changed after loading e.g. another rule set.

2)     Inconsistencies and truth: if inconsistencies are encountered e.g. site A says a person is authenticated to do something and site B says this person is not, the ‘truth’ is the determined by the rules of trust: site A e.g. will be trusted and site B not. So truth does not have anymore an absolute value.

     1 & 2 imply that the engine should be coupled to the trust system.

3)     Reasoning in a closed world often not is possible anymore. A query ‘is x a mammal’ will have no response at if x cannot be found; the engine might find: x is a bird but can this be trusted? Prolog would suppose that all mammals are known in its database and when it does not find x to be a mammal it would say: ‘x is not a mammal’. However on the internet it is not possible to assume that ‘x is not a mammal’. The world is open and in general sets should not be considered to be complete. This might pose some problems with OWL and especially with that part of the ontology that handles sets like intersection, complement of etc.. How to find the complement of a set in an open world?

4)     An open world means also heterogenous information. The engine can fetch or use facts and rules coming from anywhere. So finally it might use a database composed with data and rules from different sites that contains inconsistencies and contradictions. This poses the problem of the detection of such inconsistencies. This again is also linked to the trust system.

5)     In general proving a theorem even in propositional calculus cannot be done in a fully automatic way. [Wos]. As the Semantic Web talks about automatic interchange this means that it should not be done. So the engine will be limited to proof verification. However this implies also querying. Indeed to verify a proof it should be possible to execute queries to gather the necessary information.

6)     The existence of a Semantic Web means cooperation. Cooperation means people and their machines following the rules: the standards. The engine must work with the existing standards. There might be different standards doing the same thing. This implies the existence of conversions e.g. rule sets for the engine that convert between systems. This thesis will work with the standards XML, RDF, RDFS, OWL without therefore saying that other standards ought not to be considered. This does not imply that the author always agrees with all features of these standards but, generally, nobody will ever agree with all features of a standard. Nevertheless the standards are a necessity. Therefore the standards will be followed in this thesis. It seems however permitted to propose alternatives. For the logic of an inference engine no standards are available (or consider proposition logics and FOL as standards? ). Important standardisations efforts have to be done to assure compatibility between inference engines.  Anyhow, the more standards the less well the cooperation will work. Following the standard RDF has a lot of consequences: unification involves unifying sets of triples; functions are not available. The input and the output of the inference engine have to be in RDF. Also the trace must be in RDF for providing the possibility of proof verification.

7)     When exchanging messages between civilians, companies, government and other parties the less point to point agreements have to be made the more easy the process will become. This implies the existence of a standard ontology (or ontologies). If agreement could be reached about some e.g. ± 1000 ontological concepts and their semantic effects on the level of a computer, the interchange of messages between parties on the internet could become a lot more easy. This implies that the internet inference engines will have to work with ontologies. Possibly however this ontological material can be translated to FOL terms.

8)     The engine has to be extensible in an easy way: an API must exist that allows easy insertion of plugins. This API of course must be standardised.

9)     The engine must have a model implementation that is open source and has a structure that is easy to understand (as far as this is possible) so that everybody can get acquainted in an easy way with the most important principles.

10)     All interchanges should be fully automatic (which is why the Semantic Web is made in the first place). Human interventions should only happen when the trust policies are unable to take a decision or their rules prescribe a human intervention.

11)     The engine must be able to work with namespaces so that the origin of every notion can be traced.

12)     As facts and rules are downloaded from different internet sites and loaded into the engine, the possibility must exist also to unload them. So it must be possible to separate again files that have been merged. The engine should also be able to direct queries to different rule sets.

13)     The engine must be very fast as very large facts and rules bases can exist e.g. when the contents of a relational database are transformed to RDF. This thesis has a few suggestions for making a fast engine . On of them implicates making a typed resolution engine; the other is item 14.

14)     Concurrency: this can happen on different levels: the search for alternatives can be done concurrently over different parts of the database; the validity of several alternatives can be checked concurrently; several queries can be done concurrently i.e. the engine can duplicated in different threads.

15)     Duplicating the engine in several threads also means that it must be lightweight or reentrant.

16)     There should be a minimal set of builtins representing functions that are commonly found in computer languages, theorem provers and query engines like: lists, mathemathical functions, strings, arrays and hashtables.

17)     Existing theorem provers can be used using an API that permits a plugin to transform the standard database structure to the internal structure of the theorem prover. However, as detailed in the following texts, this is not an evident thing to do. As a matter of fact I argue that it is better to write new engines destined specifically for use in the Semantic Web.

18)     FOL technologies are a possibility: resolution can be used; also Gentzen calculus or natural deduction could be used; CWM [SWAP] uses forward reasoning. This thesis will be based on a basic resolution engine. For more information on other logic techniques see a introductory handbook on logic e.g. [BENTHEM]. Other techniques such as graph based reasoning are also possible.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Chapter  4.  Description of the research question.

 

4.1. The problem

 

Up till now several inference engines have been made that work with RDF en RDFS. Some of these just do subgraph matching on a RDF graph; others also use rules and logical implication. The logic behind the rules is often based on Horn logic [DEROO, DECKER]. CWM is based on forward reasoning. [SWAP/CWM]. These systems are of diverse nature and the logic behind them is not clear.

More theoretical results are needed before a standard defining the basic logic structure of an inference engine can be defined. These theoretical results must be based on pratical and experimental work. 

In my view the following restrictions are imposed by the structure of the internet:

·        the data are distributed. A consequence of this is the difficulty to avoid inconsistencies.

·        heterogeneity of goals and purposes must be taken into account.

·        There is a relatively slow transmission medium.

·        the scale of everything can vary from small to extensive.

Other arguments influence also the structure of an inference engine for the Semantic Web. First there is the argument of evolvability  (a general requirement for software development: see [GHEZZI] ).  The internet is not something static neither is the Semantic Web. So an inference engine deployed on the scale of the internet will be confronted with continuous development and changes. There is the argument of variability. In a system with millions of users it is to be expected that not all user communities have the same needs and look for the same properties in an inference engine. This implies that the engine must be built in a modular way. This again leads to the development of a basic engine which is kept as small as possible. This basic engine is enhanced with a set of logic specifications comprising facts and rules. However there has to be a formal specification of the engine too. Such a specification can be made with a functional language.

What can be the architecture of an inference engine for the Semantic Web?

 

When a system on internet scale is involved, some assumptions concerning logic can readily be made:        

·        no closed world: when files are collected and merged on the internet  it is not to be expected that the information will be complete.

·        paraconsistent logics: when there are contradictions,strange results would happen if the rule is accepted that from a contradiction everything can be deduced.

·        reasoning should be monotonic.

 

What system of logic is best suited for the Semantic Web?

 

A basic internet engine should be fast because the amount of data can be high; it is possible that data need to be collected from several sites over the internet. This will already imply an accumulation of transmission times. A basic engine should be optimized as much as possible.

 

What techniques can be used for optimization of the engine?

 

I said above that inconsistencies will arise. In a closed world (reasoning on one computer system) inconsistencies can be mastered; on the internet, when files from several sites are merged, inconsistencies will be commonplace.

 

How must inconsistencies be handled?

 

4.2. The solution framework

 

In order to give an answer to these questions the framework in fig. 4.1 was used.

Fig.4.1. The solution framework.

 

By specifying a basic engine research can be done about the properties of RDF using the concrete syntax Notation 3. The engine uses as input an axiom file, a query file and one or more basic files with are reasoning rules governing the behaviour of the engine. The axiom file and the query file are application dependent.

The engine is specified in Haskell. It uses resolution reasoning. Other engines are of course possible and should be investigated, but this is out of scope for this thesis.

 

 

 


Chapter 5. Inferencing and the graph model of RDF.

 

5.1. Introduction

 

In chapter 3 RDF is described from the point of view of the data model. This chapter will focus on the graph model of RDF as described in the RDF model theory [RDFM]. Examples of coding in Haskell will also be given.

I will present in this chapter an interpretation that is based on graph theoretical  reasoning. A clear and well defined definition will be given for rules, queries and solutions i.e. a definition that permits to check the validity of solutions. It also gives guidlines as to waht constitute well constructed rules. I present a series of lemmas  to prove the validity of the resolution process as it is implemented in the engine of De Roo [DEROO] and the engine developed for this thesis, N3Engine. The anti-looping mechanism from De Roo is explained and its validity is proved.

 

5.2.Recapitulation and definitions

 

I give a brief recapitulation of the most important points about RDF. [RDFM,RDF Primer, RDFSC].

RDF is a system for expressing meta-information i.e. information that says something about other information e.g. a web page. RDF works with triples. A triple is composed of a subject, a property and an object. Subject, property and object can have as value a URI or a variable. An object can also be a literal.

In the graph representation subjects and objects are nodes in the graph and  the arcs represent properties. No nodes can occur twice in the graph. The consequence is that some triples will be connected with other triples; other triples will be alone. Also the graph will have different connected subgraphs.

A node can also be an blank node. A blank node can be instantiated with a URI or a literal and then becomes a ‘normal’ node. 

A valid RDF graph is a graph where all arcs are valid URI’s  and all nodes are either valid URI’s, valid literals or blank nodes.

A tripleset is a set of connected triples i.e. triples that have at least one node in common. Therefore a tripleset is also a connected subgraph. The RDF graph is the set of all triplesets. A database is composed of an RDF graph and rules. Rules will be explained later in the text.

Triples will be noted as: (subject, property, object). A tripleset will consist of triples between ‘{‘ and ‘}’.

 

Fig.5.1 gives examples of triples.

Fig.5.1. Some RDF triples. The triples in a) together form the graph in b). A graph cannot have duplicate nodes.

 

Triples can be ‘stand-alone’ or they can form a graph. The first triple says that John owns a car and the second says that there is a car with brand Ford. The third drawing however is composed of two triples (and is thus a triple set) and it says that John owns a car with brand Ford.

I give here the Haskell representation of triples:

(Note: the Haskell code in the text is simplified compared to the source code in annexe).

 

data Triple = Triple (Subject, Predicate, Object) | Rule Rule | TripleNil

 

type Subject = Resource

type Predicate = Resource

type Object = Resource

 

data Resource = URI String | Literal String | Var Vare | ResNil

 

type TripleSet = [Triple]

 

-- the definition of a variable

data Vare = UVar String | EVar String | GVar String | GEVar String

 

Rules and variables will be explained later.

 

Suppose fig.5.1 represents the database. Now let’s make a query. A query is made when someone wants to know something. A person wants to know whether certain triples are in the database. She wants to know e.g. if John owns a car or she wants to know all persons who own a car (and are in the database).

The first query is: {(John, owns,car)}.

The second query is: {(?who, owns,car)}.

 

This needs some definitions:

A single query  is a tripleset. Semantically a query is an interrogation of the database. A query can be composed and exists then of a set of triplesets called the query set. It is then a database with an empty ruleset. When the inferencing starts the query set becomes the goallist. The goal set is the collection of triple sets that have to be proved i.e. matched against the database. This process will be described in detail later.

The answer to a query consists of one or more solutions. A solution to a query is either an affirmation that a certain triple or set of triples exists; or else it is the original query with the variables replaced by URI’s or literals. I will later give a more precise definition of solutions.

 A variable will be indicated with a question mark. For this chapter it is assumed that the variables are of type UVar i.e. local universal variables.  The types are UVar for a local, universal variable; Evar for a local, existential variable; Gvar for a global, universal variable; GEVar for a global, existential variable.

A grounded triple is a triple of which subject, property and object are URI’s or literals, but not variables. A grounded tripleset  contains only grounded triples.

A query can be grounded. In that case an affirmation is sought that the triples in the query exist, but not necessarily in the database: they can be deduced using rules.

 

Fig.5.2 gives the representation of some queries.

In the first query the question is: who owns the car ‘car’? The answer is of course “John”. In the second query the question is: what is the brand of the car ‘car’? The third query however asks: who owns the car ‘car’ and what is the brand of that car?

Here the query is a graph containing variables. This graph has to be matched with the graph in fig.5.1. So generally for executing a RDF query what has to be done is ‘subgraph matching’.

Following the data model for RDF the two queries are in fact equal because a  sequence of triples is implicitly a conjunction. Fig.5.2 illustrates this.

Fig.5.2. Some RDF queries. The query in a) is composed of two triples and forms thus a tripleset. This tripleset is equivalent to the graph in b).

 

 

Fig. 5.3.A semantic error in a sequence of RDF triples.

 

Through the implicit conjunction the first two triples have to be joined together in one subgraph. Though this is a valid RDF graph it probably is not the intention of the author of the triples. John’s car cannot have two different brands.

The variables in a query are replaced in the matching process by a URI or a literal. This replacement is called a substitution. It is a tuple (variable, URI) or (variable, literal). I will come back on substitutions later.

Suppose there exists a rule: if X owns a car then X must pay taxes. How to represent such a rule ? Fig.5.4 gives a graph representation of a rule.

 

 

Fig.5.4. A graph representation of a rule. An arrow is drawn to indicate a logical implication.

 

The nodes of the rule form a triple set. Here there is one antecedent but there could be more. There is only one consequent. Fig.5.5 gives a query that will match with the consequent of the rule.

Fig. 5.5. A query that matches with a rule.

 

The desired answer is of course: John must pay taxes.  The query of fig.5.5 is matched with the consequent of the rule in fig.5.4. Now an action has to be taken: the antecedents of the rule now become the new query. The variable ?x is replaced by the variable ?who and the new query is now: who owns a car? This is equal to a query described earlier.  Important is that a rule subgraph is treated differently from non-rule subgraphs.

 

This way of representing rules is very close to the way prolog represents rules e.g.  must_pay(X,taxes) := owns(X,car). The rule exists here of a consequent (the first part) and an antecedent (the second part after the ‘:=’). This interpretation of rules is not very satisfactory for RDF. I will later give another  interpretation of rules that is better suited for interpreting the inference process. Now I give the definition:

A rule is a triple. Its subject is a tripleset containing the antecedents and its object is a tripleset containing the consequents. The property of the rule is the URI : http://www.w3.org/2000/10/swap/log# implies. [SWAP] . I will give following notation for a rule:

 

{triple1, triple2,…} implies {triple11,triple21,….} 

 

This is the Haskell definition of a rule:

 

type Rule= (Antecedents, Consequent, Int)

type Antecedents = TripleSet

type Consequent = TripleSet

 

I have developed a unification algorithm for RDF that can handle subgraph matching. It can be named by the term: ‘subgraph matching with rules’.

Fig.5.6 gives an overview of the unification algorithm.

It works as follows: the sequence of RDF statements is divided into triplesets where each tripleset is a connected subgraph. This division is done for the database and for the query. Then the algorithm tries to match each tripleset (= subgraph) of the query with each tripleset (=subgraph) of the database. Each triple of a tripleset of the query is matched with each triple of the tripleset of the database. All the triples of the query set must be unified with a triple from the database.

 

 

Fig.5.6. Illustration of the unification algorithm. Tripleset Q unifies with tripleset 2 and rule 1.

 

This algorithm is very well suited for working with graphs: a graph can be declared by triples and rules can do inferencing about properties of graphs.

 

5.3.Matching two triplesets

 

         Fig.5.7 gives a schematic view of multiple matches of a triple with a triple set. The triple (Bart, son, ?w) unifies with two triples (Bart, son, Jan) and (Bart, son, Michèle). The triple (?w, son, Paul) matches with (Guido, son, Paul) and (Jan, son, Paul). This gives 4 possible combinations of matching triples. Following the theory of resolutions all those combinations must be contradicted. In graph terms: they have to match with the closure graph. However, the common variable ?w between the first and the second triple of triple set 1 is a restriction. The object of the first triple must be the same as the subject of the second triple with as a consequence that there is one solution: ?w = Jan. Suppose the second triple was: (?w1, son, Paul). Then this restriction does not any more apply and there are 4 solutions:

{(Bart, son, Jan),(Guido, son, Paul)}

{(Bart, son, Jan), (Jan, son, Paul)}

{(Bart, son, Michèle), (Guido, son, Paul)}

{(Bart, son, Michèle), (Jan, son, Paul)}

 

This example shows that when unifying two triple sets the product of all the alternatives has to be taken while the substitution has to be propagated from one triple to the next for the variables that are equal and also within a triple if the same variable or node occurs twice in the same triple.

Two remarks:

1)     This algorithm can be implemented by a resolution process with backtracking. After explaining the matching of a tripleset with a rule I will explain this algorithm as an example of the basic resolution process.

2)     The triple sets can contain variables; the algorithm amounts to the search of all possible subgraphs of a connected graph, where the possible subgraphs are defined by the triples and the variables in the first triple set.

 

Fig.5.7. Multiple matches within one TripleSet.

 

5.4.The model interpretation of a rule

 

A number of results from the RDF model theory [RDFM] are useful .

 

Subgraph Lemma. A graph entails all its subgraphs.

Instance Lemma. A graph is entailed by any of its instances.

An instance of a graph is another graph where one or more blank nodes of the previous graph are replaced by a URI or a literal.

Merging Lemma. The merge of a set S of RDF graphs is entailed by S and entails every member of S.

The merge can be seen as a merge of triplesets into one tripleset.

Monotonicity Lemma.  Suppose S is a subgraph of S’ and S entails E. Then S’ entails E.

 

These lemmas establish the basic ways of reasoning on a graph. Later on I will add one basic way that is the implication which is materialized by the usage of rules.

 

The closure procedure used for axiomatizing RDFS in the RDF model theory [RDFM] inspired me to give an interpetation of rules based on a closure process.

Take the following rule R describing the transitivity of the ‘subClassOf’ relation:

{( ?c1, subClassOf, ?c2),( ?c2, subClassOf, ?c3)} implies {(?c1, subClassOf , ?c3)}.

The rule is applied to all sets of triples in the graph G of the following form:

{(c1, subClassOf, c2),  (c2, subClassOf, c3)}

yielding a triple (c1, subClassOf, c3).

This last triple is then added to the graph. This process continues till no more triples can be added. A graph G’ is obtained called the closure of G with respect to the rule R. In the ‘subClassOf’ example this leads to the transitive closure of the subclass-relation.

If a query is posed:  (cx, subClassOf, cy) then the answer is positive if this triple is part of the closure G’; the answer is negative if it is not part of the closure.

When variables are used in the query:

(?c1, subClassOf, ?c2)

then the solution consists of all triples (subgraphs) in the closure G’ with the predicate ‘subClassOf’.

 

The process is illustrated in fig. 5.8 and 5.9 with a similar example.

 

This gives following definitions:

A rule R is valid with respect to the graph G if the closure G’ of G with respect to the rule R is a valid RDF graph.

The graph G entails the graph T using the rule R if T is a subgraph of the closure G’ of the graph G with respect to the rule R.

A valid RDF graph follows the data model of RDF (chapter 2).

 

When more than one rule is used the closure G’ will be obtained by the repeated application of the set of rules till no more triples are produced.

Any solution obtained by a resolution process is then a valid solution if it is a subgraph of the closure  obtained.

I will not enter into a detailed comparison with first order resolution theory but I want to mention following points:

the closure graph is the equivalent of a minimal Herbrand model in first order resolution theory or a least fixpoint in fixpoint semantics.[VAN BENTHEM] (See also chapter 4).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Fig.5.8. A graph with some taller relations and a graph representing the transitive closure of the relation taller relative to the first graph.

 

Fig. 5.9. The closure G’ of a graph G with respect to the given rule. The possible answers are: (Frank,taller,Guido), (Fred,taller,Guido),(John,taller,Guido).

 

5.5.Matching a tripleset with a rule

 

Take a rule:

{T1,T2} implies {T3,T4}

where T1,T2,T3 and T4 are triples.

This rule is equivalent to the following two rules:

{T1,T2} implies {T3}

{T1,T2} implies {T4}

The proof in FOL is:

(T1ÙT2) à(T3ÙT4) è Ø (T1ÙT2)Ú (T3ÙT4) è (ØT1ÚØT2) Ú(T3ÙT4)è

((ØT1ÚØT2) ÚT3) Ù((ØT1ÚØT2) ÚT4) è ((T1ÙT2)àT3) Ù((T1ÙT2)àT4)

This implies that when matching a tripleset with the consequents of a rule, the tripleset does not have to match with all consequents.

The matching of a tripleset with a rule is the most complicated part of the unification process. The semantic meaning of the rule is the addition of triples to the graph to produce the closure. In backwards reasoning the added triples have to be replaced by the originals i.e. the antecedents of the rule. This is the reverse process of the forward reasoning process. So if a tripleset must be matched it must be proofed that it is a subgraph of the closure graph. If some of its triples were produced by a rule these have to be replaced by the antecedents of the rule. However the remaining triples of the tripleset still must be matched with a subgraph of the closure. This means they have to be returned to the goallist (see further) together with the antecedents of the rule. The unification then returns two sets of triples: the antecedents of the rule with its substitution and the remaining triples of the tripleset with a null substitution. This is a tuple of alternatives. In the resolution process these two sets must be seen as one alternative.

It must be taken into account that there might be more than one way in which a tripleset can match with a rule, so that more than one tuple can be returned.

 

This is the Haskell definition of alternatives:

 

type Alternatives = [(TripleSet,SubstitutionList)]

 

One alternative is a tripleset plus a substitution.

The Haskell function unifyTripleSetWithRule executes this unification of a tripleset with a rule.

 

In fig. 5.10 tripleset 1 is matched with tripleset 2 that is a rule.  The rule has only one consequent. There is one triple matching with the consequent:

(Guido, parent, ?w1).

The matching will return:

The triple (?w1,child,Guido) with the substitutions (?b,  Guido) and  (?a,?w1) and the remaining triples {(Bart,son,Jan),(Paul,child,Guido),(Jan,brother,Guido)} with a null substitution.

Fig.5.10. The matching with a rule in the resolution process.

(Guido,parent,?w1) matches with (?b,parent,?a).The result is the substitution: ((?b,Guido),(?w1,?a))  and the tripleset {(?w1, child,Guido)}; a null substitution and the tripleset {(Bart,son,Jan),(Paul,child,Guido),(Jan,brother,Guido)}.

 

I will now discuss the Haskell implementation of this process.

 

The numbers used refer to the source following the text.

 

(1)   A rule is in a tripleset with one element so the rule is equal to x.

(2)   If no triples match with the consequents of the rule then there are no alternatives returned.

(3)   Some triples did match, so return the alternatives.

(4)   A rule is composed of antecedents, consequents and a rule number. The number is not needed here.

(5)   The function unifyWithRule  does the unification between tripleset ts1 and con. This function differs from the function unifyNoRule by the fact that in unifyWithRule not all triples of the first tripleset do have to match with the second tripleset. In unifyNoRule all triples of the first tripleset must match with a triple (at least one) of the second tripleset.

(6)   As was said at the beginning of this paragraph, the unification of a tripleset with a rule returns two coupled alternatives: one for the triples that did match with the rule and a second one for the remaining triples of the tripleset that did not match with the rule. These sets are constructed with the function genRuleAlts.

(7)   Set a flag if there are not alternatives.

(9) For each alternative in alts2 a tuple of alternatives is made .

(10) ts2 are all the triples that did not match with the rule; ts1 are the triple that did match; together they form the input tripleset of unifyTripleSetWithRule.

(11) getComplement calculates the set ts2.      

 

unifyTripleSetWithRule :: (TripleSet, TripleSet) ->

                          (Bool, Alternatives)

(1)unifyTripleSetWithRule (ts1, (x:xs))

(2)   |bv1 = (False,[])

(3)   |otherwise = (True, new1)

(4)   where Rule (ants, con, _) = x

             -- get the alternatives

(5)         (ts,alts2) = (ts1,unifyWithRule ts1 con)

             -- put the antecedents as alternatives; take the rest of the

             -- tripleset as alternative

(6)          new1 = concat [genRuleAlts ts alts2]

(7)          bv1 = alts2 == []

             -- the alternatives generated by a rule are the antecedents of the rule.

             -- the triples that did not match with the rule must be returned too.

(8)         genRuleAlts ts [] = []

(9)         genRuleAlts ts (x:xs)

(10)            = [(ts2 ,[]),(ants, sl)] ++ genRuleAlts ts xs

                where (ts1,sl) = x

(11)                   ts2 = (getComplement ts ts1)

 

5.6.Unification

 

A triple consists of a subject, a property and an object. Unifying two triples means unifying the two subjects, the two properties and the two objects. A variable unifies with a URI or a literal. Two URI’s unify if they are the same. The result of a unification is a substitution. A substitution is either a list of tuples (variable, URI or Literal) or nothing (an empty list). The result of the unification of two triples is a substitutions (containing at most 3 tuples). Applying a substitution to a variable in a triple means replacing the variable with the URI or literal defined in the corresponding tuple.

 

This is the definition of a substitution in Haskell:

 

type Substitution = [(Vare, Resource)]

type STuple = (Vare,Resource)

 

STuple stands for Single Tuple.

An empty substitution is not defined because functions returns lists. If there are no substitutions an empty list is returned.

 

The following functions apply a substitution to a triple. A triple can be a rule too.

 

applySubstitutionToTriple :: Substitution -> Triple -> Triple

applySubstitutionToTriple sl t = foldl

                      (flip applySTupleToTriple) t sl

 

 

applySTupleToTriple :: STuple -> Triple -> Triple

applySTupleToTriple stuple (Triple (subject, predicate, object)) =

     Triple (applySTupleToResource stuple subject,

             applySTupleToResource stuple predicate,

             applySTupleToResource stuple object)

applySTupleToTriple stuple (Rule(antecedents, consequent,                                i)) =

     Rule (applySTupleToTripleSet stuple antecedents,

           applySTupleToTripleSet stuple consequent, i)

applySTupleToTriple stuple TripleNil = TripleNil

 

applySTupleToResource :: STuple -> Resource -> Resource

applySTupleToResource stuple (TripleSet tripleset) =

          TripleSet (applySTupleToTripleSet stuple tripleset)

applySTupleToResource (t1, t2) (Var v)

        | v == t1 = t2

        | otherwise = Var v

applySTupleToResource s t = t

 

 

The following functions define the unification of two triples.

I will first discuss the function unifyTriples. The numbers refer to the code.

(1)   if subjects, predicates (properties) and objects of the two triples unify then a positive result is returned (value true).

(2)   if (1) is not the case then a negative result is returned (value false).

(3)   Here the resources are unified: subject1 with subject2, predicate1 with predicate2, object1a with object2. Object1a is the result of applying the substitution obtained when matching the two subjects to object1.

(4)   Here a call is done to an external function where eventual a more complex unification can be done.

(5)   Apply the substitution to object1.

(6)   If the second triple is a rule then call unifyTripleSetWithRule.

(7)   If the first triple is a rule then call unifyRuleWithTripleSet.

(8)   If none of the previous applied then return a failure.

 

unifyTriples :: Triple -> Triple -> (Bool, Alternatives)

unifyTriples (Triple t1) (Triple t2)

(1)     | bool1 && bool2 && bool3 = (True,[([t4], subst1 ++ subst2 ++ subst3)])

(2)     | otherwise = (False, [([t4], [])])

(3)     where (bool1, subst1) = unifyResources subject1 subject2

              (bool2, subst2) = unifyResources predicate1 predicate2

              (bool3, subst3) = unifyResources object1a object2

              Triple(subject1, predicate1, object1) = t3

              Triple(subject2, predicate2, object2) = t4

(4)           (t3, t4) = unifIn (Triple t1, Triple t2)

(5)           object1a = appSubst object1 subst1

              appSubst object1 [] = object1

              appSubst object1 (su:sus) = applySTupleToResource su object1

unifyTriples  triple (Rule rule)

(6)             = unifyTripleSetWithRule ([triple], [Rule rule])

unifyTriples (Rule rule) triple

(7)             = unifyRuleWithTripleSet ([triple], [Rule rule])

(8)unifyTriples t1 t2 = (False, [([triple2], [])])

 

unifyResources makes the unification of two resources.

Two variables always match:

(1)   If both resources are the same variable then the unification return a match and a null substitution [].

(2)   If both variables are different there is a match and the substitution is [(v1, Var v1)]

(3) A variable matches with another resource. However a check has to be done to see whether the variables occurs in the other resource (the ‘occurs’ check of Prolog).This is done by the function varCheck.

(4) If the varCheck is negative then the substitution [(v1,r)] is returned.

(5) This is the reverse of (3) and (4).

(6) If both resources are triplesets then the function unifyTerms is called. This gives intresting possibilities but it is not basic to RDF so I will not go into detail.

(7) The other cases include the matching of URI’s and/or literals. These nodes are compared by the function compareResources.

(9)   If none of the previous possibilities is the case the two nodes do not unify.

(10)          

unifyResources :: Resource -> Resource -> (Bool, Substitution)

unifyResources (Var v1) (Var v2)

(1) |v1 == v2  = (True, [])

(2) |otherwise = (True, [(v1, Var v2)])

unifyResources (Var v1) r

(3) |varCheck (Var v1) r = (False, [(v1, r)])

(4) |otherwise = (True, [(v1, r)])

(5)unifyResources r (Var v1)

    |varCheck (Var v1) r = (False, [(v1, r)])

    |otherwise = (True, [(v1, r)])

(6)unifyResources (TripleSet t1) (TripleSet t2) =

                         unifyTerms t1 t2

unifyResources resource1 resource2

(7) |compareResources resource1 resource2 = (True, [])

(8) |otherwise = (False, [])

 

5.7.The resolution process

 

The process I use has been built with a standard SLD inference engine as a model:  the prolog engine included in the Hugs distribution [JONES]. The basic backtracking mechanism of my engine is the same. An anti-looping mechanism has been added. This is the mechanism of De Roo [DEROO]. Furthermore there are differences in the unification module caused by the graph model. Namely the matching of two triplesets is done by a second ‘internal’ inference engine.

The resolution process starts with a query. A single query is a tripleset. A query can be composed too: it then has more than one tripleset. This is equivalent to asking more than one question. The query will be matched against the triplesets of the initial graph G and against the rules. The tripleset unifies with a rule when one of the triples of the tripleset unifies with the consequent of the rule (see above: 5.4). The tripleset unifies with another tripleset if all triples in the tripleset unify with a triple of the other set. This can possibly be done in different ways. The result of the unification of two triplesets is a list of substitutions (see above: 5.2).

 

A tripleset may unify with many other triplesets or rules.

The unification with a tripleset from the graph G gives a substitution plus grounded triples (that contain no variables). No new goals are returned as subgraph matching with the original graph has occurred as this represents a leaf in the search tree.

The unification of a tripleset with a rule has as a result one or more triplesets. Each tripleset is a new goal. Each of these new goals will be the start of a new search path. One of these paths will be further investigated and after a unification this path will split again into other paths , stop with a failure or stop after a subgraph match with the given graph (producing a leaf in the search tree). The other paths will be pushed on a stack to be investigated later. These search paths form a tree. In a depth first search the search goes on following one path till a leaf of the tree is met. Then backtracking is done one level in the tree and the search goes on till a new leaf is met.

In a breadth first search for all paths one unification is done, one path after another. All paths are followed at the same time till, one by one, they reach an endpoint.

 

5.8.Structure of the engine

 

The inference engine : this is where the resolution is executed. I first give an overview of the datastructures:

·        the database: type [TripleSet]

·        the query: type [TripleSet]

·        a goal: type TripleSet

·        the goallist: type GoalList = [TripleSet]

·        an alternative: type Alternative = (TripleSet, Substitution)

·        a list of alternatives: type Alternatives = [Alternative]

·        the stack: type Stack = [(Substitution, GoalList, Alternatives)]

 There are three parts to this engine:

a)   solve : the generation of new goals by selection of a goal from the goallist by some selection procedure and unifying this goal against all triplesets of the database thereby producing a set of alternative triplesets. The process starts by adding the query to the goallist. If the goallist is empty a solution has been found and a backtrack is done in search of other solutions.

b) choose: add one of the alternative triplesets to the goallist; the other ones are pushed on the stack. Each set of alternative goals is pushed on the stack together with the current goallist and the current substitution. If solve did not generate any alternative goals there is a failure (unification did not succeed) and a backtrack must be done to get an alternative goal.

c)  backtrack: an alternative goal is retrieved from the stack and added to the goallist. If the stack is empty the resolution process is finished. A failure occurs if for none of the alternatives a unification is possible; otherwise a set of solutions is given.

 

The algorithm in high-level imperative style programming:

goalList = all triplesets in the query.

do {

   while (not goalList = empty) {

      select a goal.

      If this goal is a new goal unify this goal against the database producing a set of alternative goals ( = all the triplesets which unify with the selected goal) and eliminate this goal from the goalList

      else if the goal did not unify do a backtrack: retrieve a goalList from the stack

      else if the process is looping (the set of alternatives has occurred before): do a backtrack retrieve a goalList from the stack.

      else

      add one of this set of alternatives  to the goalList and push the others on the stack

   } // while

   a solution has been found;

   retrieve an alternative from the stack

} until (stack == empty)

 

The goal which is selected from the goalList is the head of the goalList. The alternative which is chosen from the list of alternatives is the first alternative in the list.

 

5.9.The backtracking resolution mechanism in Haskell

 

The unification of two triplesets uses a backtracking resolution algorithm. This is the internal inference engine. The main external inference engine tries to match triplesets with triplesets or triplesets with rules. The internal engine matches triples with triples and in this way executes a subgraph matching. The data structures are somewhat different from those mentioned above that were the structures for the external engine. A main difference is that TripleSet is replaced by Triple. An alternative is just a substitution what is the consequence of the fact that there are no rules in the database so that the antecedents must not be saved in an alternative.

·        the database is a tripleset: type TripleSet

·        the query is a tripleset: type TripleSet

·        a goal: type Goal = Triple

·        the goallist: type GoalList = [Triple]

·        an alternative: type Alternative = ( Substitution)

·        a list of alternatives: type Alternatives = [Alternative]

·        the stack: type Stack = [(Substitution, GoalList, Alternatives)]

 

The numbers refer to the code hereafter.

(2) and (3): if one of the triplesets is empty nothing is returned. This should not occur normally. It is just added for completeness.

(4) The result of the unificiation is a set of alternatives. All triples of the first tripleset must match with the second; if not there is a failure.

(7) The solve function returns a list of substitutionlists.

(6) For each substitutionlist: apply the substitutionlist to the first tripleset and return a tuple (tripleset,substitutionlist).

(11)    The first tripleset is empty; this means a match with the second tripleset has been found; a new set and substitutionlist is retrieved from the stack and another match is searched.

(12)    Solve will try to match the first triple of the tripleset. This is done in line (14) with the function unifyTripleWithTripleSet that returns a boolean (success or failure) and a set of alternatives.

(13) Before matching the current substitutionlist is applied to the triple . This is necessary because assignations to variables (this is what substitutions really are) must be respected while persuing a goal.

(15) The substitutions are extracted from the alternatives.

(11) If there were no alternatives choose is called with an empty substitutionlist causing a backtrack to happen.

(12) Otherwise choose is called ‘normally’:

(17) choose adds a substitutionlist to the current list and pushes the tripleset and the other substitutions on the stack.

(16) If choose is called without substitutions meaning no match was done then a backtrack is done.

(18) One substitution is added to the current substitution.

(19) The other substitutions and the tripleset are pushed on the stack. 

(21) When backtracking an entry is pulled from the stack and a call to the function choose is done.

(20) If the stack is empty nothing is returned and the resolution process ends.

 

-- unify two triplesets where none contains a rule

1) unifyNoRule :: TripleSet -> TripleSet -> Alternatives

2) unifyNoRule [] _ = []

3) unifyNoRule _ [] = []

4) unifyNoRule ts1 ts2

5)                                                = alts

6)          where alts = [((applySubstitutionToTripleSet a                   ts1),a)|a <- sll]

7)                sll = solve ts1 ts2 [] []

 

8)                solve []     ts2 stack substitution

9)                    = (substitution:backtrack ts2 stack)

10)               solve (t:ts) ts2 stack substitution

11)                   |alts == [] = choose [] ts ts2 stack                           substitution

12)                   |otherwise = choose sll (ts) ts2 stack substitution

13)                   where triple = applySubstitutionToTriple substitution t

14)                         (bool,alts) =unifyTripleWithTripleSet triple ts2

15)                         sll = [sls|(_, sls) <- alts]

                        

 

16)                choose []  ts ts2 stack substitution =      backtrack ts2 stack

17)                choose (sl:sll) ts ts2 stack substitution

                           = solve ts ts2 stack1 substitution

18)                        where substitution1 = substitution ++ sl

19)                              stack1 = (substitution,ts, sll):stack

20)                backtrack ts2 [] = []

21)                backtrack ts2 ((su,ts,sll):st)

                               = choose sll ts ts2 st su

 

The mechanism which is followed is similar to SLD-resolution: Selection, Linear, Definite [CS]. There is a selection function for triplesets; a quit simple one given the fact that the first in the list is selected. This is one of the points where optimization is possible namely by using another selection function. In my engine however  the rules are placed after the facts thereby realizing a certain optimization: the preference is given to a match with facts. This has as a consequence that the first solution might be found more rapidly .

Linear is explained here below.

In prolog the definition of a definite sentence is a sentence that has exactly one positive literal in each clause and the unification is done with this literal. In my engine the clauses can have more than one consequent and thus are not necessarily definite, but they are in most examples.

A resolution strategy selects the clauses (here triplesets and rules) that are used for the unification process.

Following resolution strategies are respected by  an SLD-engine:

·        depth first: each alternative is investigated until a unification failure occurs or until a solution is found. The alternative to depth first is breadth first.

·        set of support: at least one parent clause must be from the negation of the goal or one of the “descendents” of such a goal clause. This is a complete procedure that gives a goal directed character to the search. unit resolution: at least one parent clause must be a “unit clause” i.e. a clause containing a single literal. As I do subgraph matching this is not the case for my engine.

·         input resolution: at least one parent comes from the set of original clauses (from the axioms and the negation of the goals). This is not complete in general but complete for Horn clause KB's.

·        linear resolution: one of the parents is selected with set of support strategy and the other parent is selected by the input strategy. This is used in my engine at the level of triplesets (a tripleset = a clause).

·        ordered resolution: this is the way prolog operates; the clauses are treated  from the first to the last and each single clause is unified from left to right.

As already said I put the rules behind the facts.

 

5.10.The closure path.

 

5.10.1.Problem

 

In the following paragraphs I develop a notation that enables to make a comparison between forward reasoning (the making of a closure) and backwards reasoning (the resolution process). The meaning of a rule and what represents a solution to a query are defined using forward reasoning. Therefore the properties of the backward reasoning process are proved by comparing with the forward reasoning process.

 

5.10.2. The closure process

 

The original graph that is given will be called G. The closure graph resulting from the application of the set of rules will be called G’ (see further). A rule will be indicated by a miniscule plus an italic index. A subgraph of G or another graph (query or other) will be indicated by a miniscule and an index.

A rule is applied to a graph G; the antecedents of the rule are matched with one or more subgraphs of G to yield a consequent that is added to the graph G. The consequent must be grounded for the graph to stay a valid RDF graph. This means that the consequent can not contain variables that are not in the antecedents.

 

5.10.3. Notation

 

A closure path is a finite sequence of triples (not to be confused with RDF triples):

(g1,r1,c1),(g2,r2,c2), ….,(gn,rn,cn)

where (g1,r1,c1) means: the rule r1 is applied to the subgraph (tripleset) g1 to produce the subgraph (tripleset) c1  that is added to the graph. A closure path represents a part of the closure process. This is not necessarily the full closure process though the full process is represented also by a closure path.

The indices in gi,ri,ci represent sequence numbers but not the identification of the rules and subgraphs. So e.g. r7 could be the same rule as r11.I introduce this notation to be able to compare closure paths and resolution paths (defined further) for which purpose I do not need to know the identification of the rules and subgraphs. 

It is important to remark that all triples in the triplesets (subgraphs) are grounded i.e. do not contain variables. This imposes the restriction on the rules that the consequents may not contains variables that are absent in the antecedents.

5.10.4. Examples

 

In fig.5.13 rule 1 {(?x, taller, ?y),(?y, taller, ?z)} implies {(?x, taller, ?z)} matches with the subgraph {(John,taller,Frank),(Frank,taller,Guido)}

to produce the subgraph {(John,taller,Guido)}  and the substitution ((x,John),(y,Frank),(z,Guido))

In the closure process the subgraph {(John,taller,Guido)} is added to to the original graph.

 

Fig. 5.12 , 5.13 and 5.14 illustrate the closure paths.

 

In fig. 5.14 the application of rule 1 and then rule 2  gives the closure path (I do not write completely the rules for brevity):

({(John,taller,Frank),(Frank,taller,Guido)},rule 1, {(John,taller,Guido)}),

({(John,taller,Guido)}, rule 2,{(Guido,shorter,John)})

Two triples are added to produce the closure graph:

(John,taller,Guido) and (Guido,shorter,John).

 

 

 

 

 

Fig. 5.12.The resolution process with indication of resolution and closure paths.

 

 

Fig.5.13. The closure of a graph G

 

 

Fig.5.14. Two ways to deduce the triple (Guido,shorter,John).

 

5.10.The resolution path.

 

5.10.1. The process

 

In the resolution process the query is unified with the database giving alternatives. An alternative is a tripleset and a substitutionlist. One tripleset becomes a goal. This goal is then unified etc… When a unification fails a backtrack is done and the process restarts with a previous goal. This gives in a resolution path a series of goals: query, goal1, goal2,… each goal being a subgraph. If the last goal is empty then a solution has been found; the solution exists of a substitutionlist. When this substitutionlist is applied to the series of goals: query, goal1, goal2, … of the resolution path  all subgraphs in the resolution path will be grounded i.e. they will not any more contain any variables (otherwise the last goal cannot be empty).  The reverse of such a grounded resolution path is a closure path. I will come back on this. This is illustrated in fig. 5.12.

 

A subgraph g1 (a goal) unifies with the consequents of a rule with result a substitution s and a subgraph g1’ consisting of the antecedents of the rule

or

g1 is unified with a connected subgraph of G with result a substitution s1 and no subgraph in return. In this case I will assume an identity rule has been applied i.e. a rule that transforms a subgraph into the same subgraph with the variables instantiated. An identity rule is indicated with rid.

 

The application of a rule in the resolution process is the reverse of the application in the closure process. In the closure process the antecedents of the rules are matched with a subgraph and replaced with the consequents; in the resolution process the consequents of the rule are matched with a subgraph and replaced with the antecedents.

 

5.10.2. Notation

 

A resolution path is a finite sequence of triples (not to be confounded with RDF triples):

(c1,r1,g1), (c2,r2,g2), ….,(cn,rn,gn)

(ci,ri,gi) means: a rule ri is applied to subgraph ci to produce subgraph gi. Contrary to what happens in a closure path gi can still contain variables. Associated with each triple (ci,ri,gi)  is a substitution si. 

The accumulated list of substitutions [si] must be applied to all triplesets that are part of the path. This is very important.

 

5.10.3. Example

 

An example from fig.5.15:

The application of rule 2 to the query {(?who,shorter,John)} gives the substitution ((?x,John),(?y,Frank)) and the antecedents {(Frank,shorter,John)}.

An example of closure path and resolution path as they were produced by a trace of the engine can be found in annexe 2.

 

Fig.5.15. Example of a resolution path.  

 

5.11.Variable renaming

 

There are two reasons why variables have to be renamed when applying the resolution process (fig.5.16) [AIT]:

1)     Different names for the variables in different rules. Though the syntaxis permits to define the rules using the same variables the inference process cannot work with this. Rule one might give a substitution (?x, Frank) and  later on in the resolution process rule two might give a substitution (?x,John). So by what has ?x to be replaced?

          It is possible to program the resolution process without this renaming but I think it’s a lot more complicated than renaming the variables in each rule. 

2)     Take the rule:

{(?x1,taller, ?y1),(?y1,taller, ?z1)} implies {( ?x1,taller, ?z1)}

           At one moment  (John,taller,Fred) matches with (?x1,taller,?z1) giving substitutions:

            ((?x1, John),(?z1, Fred)).

            Later in the process (Fred,taller,Guido) matches with (?x1,taller,?z1) giving substitutions:

            ((?x1, Fred),( ?z1, Guido)).

So now, by what is ?x1 replaced? Or ?z1?

This has as a consequence that the variables must receive a level numbering where each step in the resolution process is attributed a level number.

The substitutions above then become e.g.

((1_?x1, John),(1_?x1, Fred))

and

((2_?x1 , Fred),(2_?z1, Guido))

This numbering is mostly done by prefixing with a number as done above. The variables can be renumbered in the database but it is more efficient to do it on a copy of the rule before the unification process.

In some cases this renumbering has to be undone e.g. when comparing goals.

The goals:

(Fred, taller, 1_?x1).

(Fred, taller, 2_?x1).

are really the same goals: they represent the same query: all people shorter than Fred.

          

Fig. 5.16. Example of variable renaming.

 

5.12.Comparison of resolution and closure paths

 

There is clearly a likeness between closure paths and resolution paths. As well for a closure path as for a resolution path there is a sequence of triples  (x1,r1,y1) where xi and yi are subgraphs and ri is a rule. In fact parts of the closure process are done in reverse in the resolution process.

I want to stress here the fact that I define closure paths and resolution paths with the complete accumulated substitution applied to all triplesets that are part of the path.

Question: is it possible to find criteria such that steps in the resolution process can be judged to be valid based on (steps in ) the closure process?

If a rule is applied in the resolution process to a triple set and one of the triples is replaced by the antecedents then schematically:

(c, as)

where c represent the consequent and as the antecedents.

In the closure process there might be a step:

(as, c) where c is generated by the subgraph as.

 

A closure path generates  a solution when the query matches with a subgraph of the closure produced by the closure path.

 

I will present some lemmas here with explanations and afterwards I will present a complete theory.

 

Final Path Lemma.  The reverse of a final path is a closure path. All final paths are complete and valid.

 

I will explain the first part here and the second part later.

A final resolution path is the resolution path when the goal list has been emptied and a solution has been found.

 

Take a resolution path (q,r1,g1). Thus the query matches once with rule r1 to generate subgraph g1 who matches with graph G. This corresponds to the closure path: (g1,r1,q )  where q in the closure G’ is generated by rule r1.

Example: Graph G :

{(chimp, subClassOf, monkeys),( monkeys, subClassOf, mammalia)}

And the subclassof rule:

{(?c1, subClassOf,?c2),(?c2, subClassOf,?c3)} implies {(?c1, subClassOf,? c3)}

- The closure generates using the rule:

(chimp, subClassOf, mammalia)

 and adds this to the database.

- The query:

{(chimp, subClassOf, ?who)}

will generate using the rule :

{(chimp, subClassOf,?c2),(?c2, subClassOf,?c3)}

and this matches with the database  where ?c2 is substituted by monkeys and ?c3 is substituted by mammalia.

 

Proof:

Take the resolution path:

(c1,r1,g1), (c2,r2,g2), … (cn-1,rn-1,gn-1),(cn,rn,gn)

This path corresponds to a sequence of goals in the resolution process:

c1,c2, ….,cn-1,cn. One moment or another during the process these goals are in the goallist. Goals generate new goals or are taken from the goallist when they are grounded. This means that if the goallist is empty all variables must have been substituted by URI’s or literals. But perhaps a temporary variable could exist in the sequence i.e. a variable that appears in the antecedents of a rule x, also in the consequents of a rule y but that has disappeared in the antecedents of rule y. However this means that there is a variable in the consequents of rule y that is not present in the antecedents what is not permitted.

 

Solution Lemma I: A solution corresponds to an empty closure path (when the query matches directly with the database) or to a minimal closure path.

Note: a minimal closure path with respect to a query is a closure path with the property that the graph produced by the path matches with the query and that it is not possible to diminish the length of the path i.e. to take a rule away. 

 

Proof: This follows directly from the definition of a solution as a subgraph of the closure graph. There are two possibilities: the triples of a solution either belong to the graph G or are generated by the closure process. If all triples belong to G then the closure path is empty. If not there is a closure path that generates the triples that do not belong to G. 

 

It is clear that there can be more than one closure path that generates a solution. One must only consider applying rules in a different sequence. As the reverse of the closure path is a final resolution path it is clear that solutions can be duplicated during the resolution process.

 

Solution Lemma II: be C the set of all closure paths that generate solutions. Then there exist matching resolution paths that generate the same solutions.

 

Proof: Be (g1,r1,c1),(g2,r2,c2), ….,(gn,rn,cn) a closure path that generates a solution.

Let (cn,rn,gn), … (c2,r2,g2),(c1,r1,g1)  be the corresponding resolution path.

The applied rules are called r1, r2, …, rn.

rn applied to cn gives gn in the resolution path. Some of the triples of gn can match with G, others will be proved by further inferencing (for instance one of them might match with the tripleset cn-1).When cn was generated in the closure it was deduced from gn. The triples of gn can be part of G or they were derived with rules r1,r2, …rn. If they were derived then the resolution process will use those rules to inference them in the other direction. If a triple is part of G in the resolution process it will be directly matched with a corresponding triple from the graph G.

 

The resolution path is a valid resolution path if after substitution the query is grounded or the resolution path can be extended by adding steps till the query is grounded.

The resolution path is invalid when it is not valid.

The resolution path is complete when after substitution all triples in it are grounded.

 

Final Path Lemma.  The reverse of a final path is a closure path. All final paths are complete and valid.

I give now the proof of the second part.

Proof: Be (cn,rn,gn), … (c2,r2,g2),(c1,r1,g1)  a final resolution path. I already proved in the first part that it is grounded because all closure paths are grounded. g1 is a subgraph of G (if not, the triple (c1,r1,g1)  should be followed by another one). cn is necessarily a triple of the query. The other triples of the query match with the graph G or match with triples of ci, cj etc.. (consequents of a rule). The reverse is a closure path. However this closure path generates all triples of the query that did not match directly with the graph G. Thus it generates a solution. 

An example of closure path and resolution path as they were produced by a trace of the engine can be found in annexe 2.

 

5.13.Anti-looping technique

 

The technique described here stems from an oral communication of De Roo [DEROO]. Given a  resolution path: (cn,rn,gn), … (c2,r2,g2),(c1,r1,g1).  Suppose the goal c2 matches with the rule r2 to generate the goal g2. When that goal is identical to one of the goals already present in the resolution path and generated also by the same rule then a loop may exist where the goals will keep coming back in the path.

Take the resolution path:

(cn,rn,gn), …(cx,rx,gx) …(cy,ry,gy)… (c2,r2,g2),(c1,r1,g1)

where the triples (cx,rx,gx),(cy,ry,gy) and (c2,r2,g2) are equal (recall that the numbers are sequence numbers and not identification numbers).  

 

I call, by definition, a looping path a path in which the same triple (cx,rx,gx) occurs twice.

The anti-looping technique consists in failing such a path from the moment a triple (cx,rx,gx) comes back for the second time. These triples can still contain variables so the level numbering has to be stripped of the variables before the comparison.

 

No solutions are excluded by this technique as shown by the solution lemma II. Indeed the reverse of a looping path is not a closure path. But there exists closure paths whose reverse is a resolution path containing a solution. So all solutions can still be found by finding those paths.

 

In my engine I controle whether the combination antecedents-rule returns which is sufficient because the consequent is determined then.

In the implementation a list of antecedents is kept. This list must keep track of the current resolution level; when backtracking goals with higher level than the current must be token away.

 

That this mechanism stops all looping can best be shown by showing that it limits the maximum depth of the resolution process, however without excluding solutions.

When a goal matches with the consequent of rule r producing the antecedents as, the tuple (as, r) is entered into a list, called the oldgoals list. Whenever this tuple (as,r) comes back in the resolution path backtracking is done. This implies that in the oldgoals list no duplicate entries can exist. The number of possible entries in the list is limited to all possible combinations (as, r) that are finite whenever the closure graph G’ is finite. Let this limit be called maxG. Then the maximum resolution depth is maxR = maxG + maxT where maxT is the number of grounded triples in the query and maxG are all possible combinations (as, r). Indeed, at each step in the resolution process either a rule is used generating a tuple (as,r) or a goal(subgraph) is matched with the graph G.

When the resolution depth reaches maxR all possible combinations (as,r) are in the oldgoals list; so no rule can produce anymore a couple (as,r) that is not rejected by the anti-looping technique.

In the oldgoals list all triples from the closure graph G’ will be present because all possible combinations of goals and rules have been tried. This implies also that all solutions will be found before this maximum depth is reached.

Why does this technique not exclude solutions?

Take a closure path (as1, r1,c1) … (asn,rn,cn) generating a solution . No rule will be applied two times to the same subgraph generating the same consequent to be added to the closure. Thus no asx is equal to asy. The reverse of a closure path is a resolution path. This proofs that there exist resolution paths generating a solution without twice generating the same tuple (as,r).

A maximal limit can be calculated for maxG.

Be atot the total number of antecedents generated by all rules and ntot is the total number of labels and variables. Each antecedent is a triple and can exist maximally in ntot**3 versions. If there are n1 grounded triples in the query this gives for the limit of maxG: (ntot**3) **(atot + n1).

This can easily be a very large number and often stack overflows will occur before this number is reached.

 

Failure

 

A failure (i.e. when a goal does not unify with facts or with a rule) does never exclude solutions. Indeed the reverse of a failure resolution path can never be a closure path. Indeed the last triple set of the failure path is not included in the closure. If it were its triples should either match with a fact or with the consequent of a rule.

 

Completeness

 

Each solution found in the resolution process corresponds to a resolution path. Be S the set of all those paths. Be S1 the set of the reverse paths which are valid closure paths. The set S will be complete when the set S1 contains all possible closure paths generating the solution. This is the case in a resolution process that uses depth first search as all possible combinations of the goal with the facts and the rules of the database are tried.

 

Monotonicity

 

Suppose there is a database, a query and an answer to the query. Then another database is merged with the first. The query is now posed again. What if the first answer is not a part of the second answer?

In this framework monotonicity means that when the query is posed again after the merge of databases, the first answer will be a subset of the second answer.

The above definition of databases, rules, solutions and queries imply monotonicity. Indeed all the facts and rules of the first database are still present so that all the same triples will be added during the closure process and the query will match with the same subgraphs.

Nevertheless, some strange results could result.

Suppose a tripleset:

{(blood,color,”red”)}

and a query:

{(blood,color,?what)}

gives the answer:

{(blood,color,”red”)}.

Now the tripleset:

{(blood,color,”yellow”)}

is added. The same query now gives two answers:

{(blood,color,”red”)} and {(blood,color,”yellow”)}.

However this is normal as nobody did ‘tell’ the computer that blood cannot be yellow (it could be blue however).

To enforce this, extensions like OWL are necessary where a restriction is put on the property ‘color’ in relation with the subject ‘blood’.

 

In the now following formal part definitions are given and the lemmas above and others are proved.

 

Fig. 5.17. gives an overview of the lemmas and the most important points proved by them.

 

 

Fig.5.17. Overview of the lemmas.

 

 

 

 

 


 

5.14.A formal theory of graph resolution

 

Definitions:

 

A triple consists of two labeled nodes and a labeled arc.

 

A tripleset  is a set of triples.

 

A graph  is a set of triplesets.

 

A rule consists of antecedents and a consequent. The antecedents are a tripleset; the consequent is a triple.

 

Applying a rule r to a graph G in the closure process is the process of unifying the antecedents with all possible triples of the graph G while propagating the substitutions. For each successful unification the consequents of the rule is added to the graph with its variables substituted.

 

Applying a rule r to a subgraph g in the resolution process is the process of unifying the consequents with all possible triples of g while propagating the substitutions. For each successful unification the antecedents of the rule are added to the goallist of the resolution process (see the explanation of the resolution process).

 

A rule r is valid with respect to the graph G if its closure G’ with respect to the graph G is a valid  graph.

 

A graph G is valid if all its elements are triples.

 

The graph G entails the graph T using the rule R if T is a subgraph of the closure G’ of the graph G with respect to the rule R.

 

The closure G’ of a graph G with respect to a ruleset R is the result of the application of the rules of the ruleset R to the graph G giving intermediate graphs Gi till no more new triples can be generated. A graph may not contain duplicate triples.  

 

The closure G’ of a graph G with respect to a closure path is the resulting graph G’ consisting of G with all triples added generated by the closure path.

 

A solution to a query with respect to a graph G and a ruleset R is a subgraph of the closure G’ that unifies with the query.

A solution set is the set of of all possible solutions.

 

A closure path is a sequence of triples (gi,ri,ci) where gi is a subgraph of a graph Gi derived from the given graph G, ri is a rule and ci are the consequents of the rule with the substitution si.si is the substitution obtained trough matching a subgrapg of Gi with the antecedents of the rule ri.The tripleset ci is added to Gi to produce the graph Gj. All triples in a closure path are grounded.

 

A minimal closure path with respect to a query is a closure path with the property that the graph produced by the path matches with the query and that it is not possible to diminish the length of the path i.e. to take a triple away. 

 

The graph generated  by a closure path consists of the graph G with all triples added generated by the rules used for generating the closure path.

 

A closure path generates  a solution when the query matches with a subgraph of the graph generated by the closure path.

 

The  reverse path of a closure path (g1,r1,c1),(g2,r2,c2), ….,(gn,rn,cn) is:

(cn,rn,gn), … (c2,r2,g2),(c1,r1,g1).

 

A resolution path is a sequence of triples (ci,ri,gi) where ci  is subgraph that matches with the consequents of the rule ri  to generate gi  and a substitution si

. However in the resolution path the accumulated substitutions [si] are applied to all triplesets in the path.

 

A valid resolution path is a path where the triples of the query become grounded in the path after substitution or match with the graph G; or the resolution process can be further applied to the path so that finally a path is obtained that contains a solution. All resolution paths that are not valid are invalid resolution paths. The resolution path is incomplete if after substitution the triple sets of the path still contain variables. The resolution path is complete  when, after substitution, all triple sets in the path are grounded.

 

A final resolution path is the path that rests after the goallist in the (depth first search) resolution process has been emptied. The path then consists of all the rule applications applied during the resolution process.

 

A looping resolution path is a path with a recurring triple in the sequence (cn,rn,gn), …(cx,rx,gx) …(cy,ry,gy)… (c2,r2,g2),(c1,r1,g1) i.e. (cx,rx,gx) is equal to some (cy,ry,gy). The comparison is done after stripping of the level numbering.

 

The resolution process is the building of resolution paths. A depth first search resolution process based on backtracking is explained elsewhere in the text.

 

 

Resolution Lemma. There are three possible resolution paths:

1)     a looping path

2)     a path that stops and generates a solution

3)     a path that stops and is a failure

 

Proof. Obvious.

 

Closure lemma. The reverse of a closure path is a valid, complete and final resolution path for a certain set of queries.

 

Proof.  Take a closure path: (g1,r1,c1),(g2,r2,c2), ….,(gn,rn,cn). This closure path adds the triplesets c1, …,cn to the graph. All queries that match with these triples and/or triples from the graph G will be proved by the reverse path:

(cn,rn,gn), … (c2,r2,g2),(c1,r1,g1). The reverse path is complete because all closure paths are complete. It is final because the triple set g1 is a subgraph of the graph G (it is the starting point of a closure path). Take now a query as defined above: the triples of the query that are in G do of course match directly with the graph. The other ones are consequents in the closure path and thus also in the resolution path. So they will finally match or with other consequents or with triples from G.

 

Final Path Lemma:  The reverse of a final path is a closure path. All final paths are complete and valid.

 

Proof:A final resolution path is the resolution path when the goal list has been emptied and a solution has been found.

Take the resolution path:

(c1,r1,g1), (c2,r2,g2), … (cn-1,rn-1,gn-1),(cn,rn,gn)

This path corresponds to a sequence of goals in the resolution process:

c1,c2, ….,cn-1,cn. One moment or another during the process these goals are in the goallist. Goals generate new goals or are taken from the goallist when they are grounded. This means that if the goallist is empty all variables must have been substituted by URI’s or literals. But perhaps a temporary variable could exist in the sequence i.e. a variable that appears in the antecedents of a rule x, also in the consequents of a rule y but that has disappeared in the antecedents of rule y. However this means that there is a variable in the consequents of rule y that is not present in the antecedents what is not permitted.

Be (cn,rn,gn), … (c2,r2,g2),(c1,r1,g1)  a final resolution path. I already proved in the first part that it is grounded because all closure paths are grounded. g1 is a subgraph of G (if not, the triple (c1,r1,g1)  should be followed by another one). cn matches necessarily with the query. The other triples of the query match with the graph G or match with triples of ci, cj etc.. (consequents of a rule). The reverse is a closure path. However this closure path generates all triples of the query that did not match directly with the graph G. Thus it generates a solution. 

 

Looping Lemma I. If a looping resolution path contains a solution, there exists another non looping resolution path that contains the same solution.

 

Proof. All solutions correspond to closure paths in the closure graph G’ (by definition). The reverse of such a closure path is always a valid resolution path and is non-looping.

 

Looping Lemma II. Whenever the resolution process is looping, this is caused by a looping path if the closure of the graph G is not infinite.

Proof. Looping means in the first place that there is never a failure i.e. a triple or tripleset that fails to match because then the path is a failure path and backtracking occurs. It means also triples continue to match in an infinite series.

There are two sublemmas here:

Sublemma 1. In a looping process at least one goal should return in the goallist.

Proof.Can the number of goals be infinite? There is a finite number of triples in G, a finite number of triples that can be generated and a finite number of variables. This is the case because the closure is not infinite. So the answer is no. Thus if the process loops (ad infinitum) there is a goal that must return at a certain moment.

Sublemma 2. In a looping process a sequence (rule x) goal x must return. Indeed the numbers of rules is not infinite; neither is the number of goals; if the same goal returns infinitly, sooner or later it must return as a deduction of the same rule x.

 

Infnite Looping Path Lemma. If the closure graph G’ is not infinite and the resolution process generates an infinite number of looping paths then this generation will be stopped by the anti-looping technique and the resolution process will terminate in a finite time.

 

Proof. The finiteness of the closure graph implies that the number of labels and arcs is finite too.

Suppose that a certain goal generates an infinite number of looping paths.

This implies that the breadth of the resolution tree has to grow infinitely. It cannot grow infinitely in one step (nor in a finite number of steps) because each goal only generates an finite number of children (the database is not infinite). This implies that also the depth of the resolution tree has to grow infinitely.  This implies that the level of the resolution process grows infinitely and also then the number of entries in the oldgoals list.

Each growth of the depth produces new entries in the oldgoals list. (In this list each goal-rule combination is new because when it is not, a backtrack is done by the anti-looping mechanism). In fact with each increase in the resolution level an entry is added to the oldgoals list.

Then at some point the list of oldgoals will contain all possible rule-goals combinations and the further expansion is stopped.

Formulated otherwise, for an infinite number of looping paths to be materialized in the resolution process the oldgoals list should grow infinitely but this is not possible.

Addendum: be maxG the maximum number of combinations (goal, rule). Then no branch of the resolution tree will attain a level deeper than maxG. Indeed at each increase in the level a new combination is added to the oldgoals list. 

 

Infinite Lemma. For a closure to generate an infinite closure graph G’ it is necessary that at least one rule adds new labels to the vocabulary of the graph.

 

Proof. Example: {:a log:math :n } à {:a log:math :n+1). The consequent generated by this rule is each time different. So the closure graph will be infinite. Suppose the initial graph is: :a log:math :0. So the rule will generate the sequence of natural numbers.

If no rule generates new labels then the number of labels and thus of arcs and nodes in the graph is finite and so are the possible ways of combining them.

 

Duplication Lemma. A solution to a query can be generated by two different valid resolution paths. It is not necessary that there is a cycle in the graph G’.

 

Proof. Be (c1,r1,g1), (c2,r2,g2), (c3,r3,g3) and (c1,r1,g1), (c3,r3,g3), (c2,r2,g2) two resolution paths.Let the query q be equal to c1. Let g1 consist of c2 and c3 and let g2 and g3 be subgraphs of G. Then both paths are final paths and thus valid (Final Path Lemma).

 

Failure Lemma. If the current goal of a resolution path (the last triple set of the path) can not be unified with a rule or facts in the database, it can have a solution, but this solution can also be found in a final path.

 

Proof.  This is the same as for the looping lemma with the addition that the reverse of a failure path cannot be a closure path as the last triple set is not in the graph G (otherwise there would be no failure).

 

Substitution Lemma I. When a variable matches with a URI the first item in the substitution tuple needs to be the variable.

 

Proof. 1) The variable originates from the query.

a)     The triple in question matches with a triple from the graph G. Obvious, because a subgraph matching is being done.

b)    The variable matches with a ground atom from a rule.

           Suppose this happens in the resolution path when using rule r1:

           (cn,rn,gn), … (c2,r2,g2),(c1,r1,g1)

           The purpose is to find a closure path:

           (g1,r1,c1),… , (gx,rx,cx),… ,(gn, rn,cn) where all atoms are grounded.

           When rule r1 is used in the closure the ground atom is part of the closure graph, thus in the resolution process , in order to find a subgraph of a closure graph, the variable has to be substituted by the ground atom.

2) A ground atom from the query matches with a variable from a rule. Again, the search is for a closure path that leads to a closure graph that contains the query as a subgraph. Suppose a series of substitutions:

     (v1, a1)  (v2,a2) where a1 originates from the query corresponding to a resolution path: (c1,r1,g1) ,(c2,r2,g2), (c3,r3,g3). Wherever v1 occurs it will be replaced by a1. If it matches with another variable this variable will be replaced by a1 too. Finally it will match with a ground term from G’. This corresponds to a closure path where in a sequence of substitutions from rules each variable becomes replaced by a1, an atom of G’. 

3) A ground atom from a rule matches with a variable from a rule or the inverse. These cases are exactly the same as the cases where the ground atom or the variable originates from the query.

 

Substitution Lemma II. When two variables match, the variable from the query or goal needs to be ordered first in the tuple.

 

Proof. In a chain of variable substitution the first variable substitution will start with a variable from the query or a goal. The case for a goal is similar to the case for a query. Be a series of substitutions:

(v1,v2) (v2,vx) … (vx,a) where a is a grounded atom.

By applying this substitution to the resolution path all the variables will become equal to a. This corresponds to what happens in the closure path where the ground atom a will replace all the variables in the rules.

 

Solution Lemma I: A solution corresponds to an empty closure path (when the query matches directly with the database) or to one or more minimal closure paths.

 

Proof: there are two possibilities: the triples of a solution either belong to the graph G or are generated by the closure process. If all triples belong to G then the closure path is empty. If not there is a closure path that generates the triples that do not belong to G. 

 

Solution Lemma II: be C the set of all closure paths that generate solutions. Then there exist matching resolution paths that generate the same solutions.

 

Proof: Be (g1,r1,c1),… , (gx,rx,cx),… ,(gn, rn,cn) a closure path that generates a solution.

Let ((cn,rn,gn), … (c2,r2,g2),(c1,r1,g1) be the corresponding resolution path.

rn applied to cn gives gn. Some of the triples of gn can match with G, others will be deduced by further inferencing (for instance one of them might be in            cn-1). When cn was generated in the closure it was deduced from gn. The triples of gn can be part of G or they were derived with rules r1,…, rn-2. If they were derived then the resolution process will use those rules to inference them in the other direction. If a triple is part of G in the resolution process it will be directly matched with a corresponding triple from the graph G.

 

Solution Lemma III. The set of all valid solutions to a query q with respect to a graph G and a ruleset R is generated by the set of all valid resolution paths, however with elimination of duplicates.

 

Proof.  A valid resolution path generates a solution. Because of the completeness lemma the set of all valid resolution paths must generate all solutions. But the duplication lemma states that there might be duplicates.

 

Solution Lemma IV.  A complete resolution path generates a solution and a solution is only generated by a complete, valid and final resolution path.

Proof. à : be (cn, rn,gn),… , (cx,rx,gx),… , (c1,r1,g1) a complete resolution path. If there are still triples from the query that did not become grounded by 1) matching with a triple from G or 2) following a chain (cn, rn,gn),…, (c1,r1,g1) where all gn match with G, the path cannot be complete. Thus all triples of the query are matched and grounded after substitution, so a solution has been found.

ß : be (cn, rn,gn),… , (cx,rx,gx),… , (c1,r1,g1)  a resolution path that generates a solution. This means all triples of the query have 1) matched with a triple from G or 2) followed a chain (cn, rn,gn),…, (c1,r1,g1)  where gn matches with a subgraph from G. But then the path is a final path and is thus complete (Final Path Lemma).

 

Completeness Lemma. The depth first search resolution process with exclusion of invalid resolution paths generates all possible solutions to a query q with respect to a graph G and a ruleset R.

 

Proof.Each solution found in the resolution process corresponds to a closure path. Be S the set of all those paths. Be S1 the set of the reverse paths which are resolution paths (Valid Path Lemma). The set of paths generated by the depth first search resolution process will generate all possible combinations of the goal with the facts and the rules of the database thereby generating all possible final paths.

It is understood that looping paths are stopped.

 

Monotonicity Lemma. Given a merge of a graph G1 with rule set R1 with a graph G2 with rule set R2; a query Q with solution set S obtained with graph G1 and rule set R1 and a solution set S’ obtained with graph G2 and rule set R2. Then S is contained in S’.

 

Proof. Be G1’ the closure of G1. The merged graph Gm has a rule set Rm that is the merge of rule sets R1 and R2. All possible closure paths that produced the closure G1’ will still exist after the merge. The closure graph G1’ will be a subgraph of the graph Gm. Then all solutions of solution set S are still subgraphs of Gm and thus solutions when querying against the merged graph Gm.

 

 

5.15.An extension of the theory to variable arities

 

Definitions

 

A multiple of arity n is a predicate p also called arc and (n-1) nodes also called points. The points are connected by an (multi-headed) arc. The points and arcs are labeled and no two labels are the same.

A graph is a set of multiples.

Arcs and nodes are either variables or labels. (In internet terms labels could be URI’s or literals).

Two multiples unify  when they have the same arity and their arcs and nodes unify.

A variable unifies with a variable or a label.

Two labels unify if they are the same.

A rule consists of antecedents and a consequent. Antecedents are multiples and a consequent is a multiple.

The  application of a rule to a graph G consists in unifying the antecedents with multiples from the graph and, if successful, adding the consequent to the graph.

 

The lemmas of the theory explained above can be used if it is possible to reduce all multiples to triples and then proof that the solutions stay the same.

Reduction of multiples to triples:

Unary multiple: In a graph G a becomes (a,a, a) .In a query a becomes (a,a,a). These two clearly match.

Binary multiple:  (p,a) becomes (a,p,a). The query (?p,?a) corresponds with (?a, ?p, ?a).

Ternary multiple: this is a triple.

Quaternary multiple: (p, a, b, c) becomes:

{(x, p, a), (x, p, b), (x,p,c)} where x is a unique temporary label.

The query (?p,?a,?b,?c) becomes:

{(y,?p,?a),(y,?p,?b),(x,?p,?c)}.

It is quit obvious that this transformation does not change the solutions.

 

5.16.An extension of the theory to typed nodes and arcs

 

A sentence is a sequence of  syntactic entities. A syntactic entity has a type and a  label.  Example: verb/walk.

Examples of syntactic entities: verb, noun, article, etc… A syntactic entity can also be a variable. It is then preceded by an interrogation mark.

A language is a set of sentences.

Two sentences unify if their  entities unify.

A rule consists of           antecedents that are sentences and a  consequent  that is a sentence.

An example: {(noun/?noun, verb/walks)} à {(article/?article,noun/ ?noun, verb/walks)}

In fact what is done here is to create multiples where each label is tagged with a type. Unification can only happen when the types of a node or arc are equal besides the label being equal.

Of course sentences are multiples and can be transformed to triples.

Thus all the lemmas from above apply to these structures if solutions can be defined to be subgraphs of a closure graph. .

This can be used to make transformations e.g.

(?ty/label,object/?ty1) à (par/’(‘, object/?ty1).

Parsing can be done:

(?/charIn ?/= ?/‘(‘) à (?/charOut ?/= par/’(‘)

 


 

Chapter 6. RDF, inferencing and first order logic.

 

6.1.The model mismatch

 

In order to define a logic layer for the Semantic Web inspiration was sought with first order logic(FOL) and more specifically with horn clauses.[SWAP, DECKER]. Nevertheless the model theory of RDF is completely graph oriented. This creates a model mismatch where on the one hand a graph model and entailment on graphs is followed when using basic RDF, on the other hand FOL reasoning is followed when using inferencing.

This raises the question: how do you translate between the two models or is this even possible?

There are three points of difference between the graph model of RDF and FOL:

1)     RDF only uses the logic and by way of the implicit conjunction of RDF triples. The logic layer adds the logic implication and variables. Negation and disjunction (or), where necessary, have to be implemented as properties of subjects.

2)      Is a FOL predicate the same as an RDF property? Is a predicate Triple(subject,property,object) really the same as a triple (subject,property,object)?

3)     The quantifiers (forAll, forSome) are not really ‘existing’. A solution is a match of a query with the closure graph. The quantifier used is then:  ‘for_all_known_in_our_database’. The database can be an ad-hoc database that came to existence by a merge of several RDF-databases.

 

I have made a model of RDF in FOL (see chapter 4) without however claiming the isomorphism of the translation. In this model RDF and the logic extension are both viewed in terms of FOL and the corresponding resolution theory.

I have made also a model of RDF and the logic layer with reasoning on graphs (chapter 5). Here the compatibility with the RDF model theory is certain. The model enabled me to proof some characteristics of the resolution process used for my Haskell engine.

 

6.2.Modelling RDF in FOL

 

RDF works with triples. A triple is composed of a subject, predicate and object. These items belong together and may not be separated. To model them in FOL a predicate Triple can be used :

Triple(subject, predicate, object)

Following the model theory [RDFM] there can be sets of triples that belong together. If t1, t2 and t3 are triples, a set of triples might be modelled as follows:

Tripleset(t1, t2, t3)

This is however not correct as the length of the set is variable; so a list must be used (see further).

There exist also anonymous subjects: these can be modelled by an existential variable:

$ X: Triple(X, b, c)

saying that there exists something with property b and value c.

An existential elimination can also be applied putting some instance for the variable (existential elimination):

Triple(_ux, b, c)

 

Now let's consider the RDF data model [RDFM]:

Sets in the model :

1) There is a set called Resources.

2) There is a set called Literals.

3) There is a subset of Resources called Properties.

4) There is a set called Statements, each element of which is a triple of the form

{pred, sub, obj}

where pred is a property (member of Properties), sub is a resource (member of Resources), and obj is either a resource or a literal (member of Literals).(Note: this text will use the sequence {sub, pred, obj}).

 

5)RDF:type is a member of Properties.

6)RDF:Statement is a member of resources but not contained in Properties.

7)RDF:subject, RDF:predicate and RDF:object are in Properties.

 

Note: for convenience Triple will be abbreviated T and TripleSet TS.

 

This gives the following first order facts and rules:

1) T(rdf:Resources, rdf:type, rdfs:Class)

2) T(rdf:Literals, rdf:type, rdfs:Class)

3) T(rdf:Resources, rdfs:subClassOf,  rdf:Properties)

4) T(rdf:Statements, rdf:type, rdfs:Class)

    "s, sub, pred, obj: T(s, rdf:type, rdf:Statements) -> (Equal(s, Triple(sub, pred, obj)) Ù T(pred, rdf:type, rdf:Property) Ù T(sub, rdf:type, rdf:Resources) Ù T(obj, rdf:type, rdf:Resources) Ú T(obj, rdf:type, rdf:Literals)))

5) T(rdf:type, rdf:type, rdf:Property) Note the recursion here.

6) T(rdf:Statement, rdf:type, rdf:Resources)

    ØT(rdf:Statement, rdf:type, rdf:Property)

7) T(rdf:subject, rdf:type, rdf:Property)

    T(rdf:predicate, rdf:type, rdf:Property)

    T(rdf:object, rdf:type, rdf:Property)

 

As subjects and object are Resources they can be triples too (called “embedded triples”); predicates however cannot be triples because of rule 6. This can be used for an important optimization in the unification process (see further).

 

Reification of a triple {pred, sub, obj} of Statements is an element r of Resources representing the reified triple and the elements s1, s2, s3, and s4 of Statements such that

s1: {RDF:predicate, r, pred}
s2: {RDF:subject, r, subj}
s3: {RDF:object, r, obj}
s4: {RDF:type, r, [RDF:Statement]}

 

In fol:

"r, sub, pred, obj: T(r, rdf:type, rdf:Resources) Ù T(r, rdf :predicate, pred) Ù T(r, rdf:subject, subj) Ù T(r, rdf:subject, obj) -> T(r, Reification, T(sub, pred, obj))

 

As was said a set of triples has to be represented by a list:

 

{:d :e :f. :g :h :i. :a :b :c} becomes:

TS(T(d, e, f),TS(T(g,h,i),TS(T(a,b,c))))

 

In prolog e.g. :

[T(d, e, f)|T(g, h, i)|T(a, b, c].

In Haskell this is better structured:

type TripleSet =  [Triple]

tripleset =  [Triple(d, e, f), Triple(g, h, i), Triple(a, b, c)]

 

Conclusion

 

The RDF data model can be reduced to a subset of first order logic. It is restricted to the use of a single predicate Triple and lists of triples. However this model says nothing about the graph structure of RDF.(See chapter 3).

 

For convenience in what follows lists will be represented between ‘[]’; variables will be identifiers prefixed with ?.

 

6.3.Unification and the RDF graph model

 

An RDF graph is a set of zero, one or more connected subgraphs. The elements of the graph are subjects, objects or bNodes (= anonymous subjects). The arcs between the nodes represent predicates.

 

Suppose there is the following database:

TS[T(Jack, owner, dog),T(dog, name, “Pretty”), T(Jack, age, “45”)], TS(T(dog, name, “Ugly”))

 

And a query: TS(T(?who, owner, dog), T(dog, name, ?what))

Clearly the answer:T(Jack, owner, dog),T(dog, name, “Ugly”) is not wanted. So the triples in the triple set of the query may not be separated and they must match with triples in a connected subgraph. The situation of course is different when the query is:

TS(T(?who, owner, dog)), TS(T(dog, name, ?what))

Here in fact three answers will be given.

If we represent the first subgraph of the database and the first query as follows:

T(Jack, owner, dog) Ù T(dog, name, “Pretty) Ù T(Jack, age, “45”)

and the query(here negated for the proof):

ØT(?who, owner, dog) Ú ØT(dog, name, ?what)

Then by applying the rule of UR (Unit Resolution) to the query and the triples T(Jack, owner, dog) and T(dog, name, “Pretty) the result is the substitution: [?who à Jack, ?what à “Pretty”] .

 

Proof of this application: T(a) Ù T(b) Ù (ØT(a) Ú ØT(b)) =

T(a) Ù (T(b) Ù ØT(a)) Ú (T(b) Ù ØT(b)) = T(a) Ù (T(b) Ù ØT(a)) = False

This proof can be extended to any number of triples. This is in fact UR resolution with a third empty clause.

In [WOS]: “Formally, UR-resolution, is that inference rule that applies to a set of clauses one of which must be a non-unit clause, the remaining must be unit clauses, and the result of successful application must be a unit clause. Furthermore, the nonunit clause must contain exactly one more literal than the number of (not necessarily distinct) unit clauses in the set to which UR-resolution is being applied. “

 

A query (goal) applied to a rule gives:

 (T(a) ÙT(b)) à T(c) = Ø T(a) Ú Ø T(b) Ú  T(c)

where the query will be : Ø T(?c)

Here the application of binary resolution gives the new goal:

 Ø T(a) Ú Ø T(b) and the substitution (?c, c).

In [WOS] :  “ Formally, binary resolution is that inference rule that takes two clauses, selects a literal in each of the same predicate but of opposite sign, and yields a clause providing that the two selected literals unify. The result of a successful application of binary resolution is obtained by applying the replacement that unification finds to the two clauses, deleting from each (only) the descendant of the selected literal, and taking the or of the remaining literals of the two clauses. “ 

 

This application does not produce a contradiction; it adds the clause                           Ø T(a) Ú Ø T(b) to the set of support.

The logical implication is not part of RDF or RDFS. It will be part of a logic standard for the Semantic Web yet to be established.

 

This unification mechanism is classical FOL resolution, however with a restriction on the number of solutions imposed by the constraint of subgraph matching.

 

This model of unification is not natively supported by current theorem provers; in order to use it with those provers, programming the model is necessary.

 

6.4.Embedded rules

 

Embedded rules should pose no problems. When a rule produces a rule it is added to the set of support. A rule in the set of support should be unified with a tripleset but not with another rule.

An example of the use of embedded rules:

Suppose a reasoning generates a triple T(outcome, =, X1) and depending on that value a rule must be generated: T(?X1,>,X2) à T(?X1, = ,?X1/X3).

Hence the rule: (T(?X1,>,X2) à T(?X1,=, X3)) à T(outcome, =, ?X1). This rule generated in the set of support will then unify with clauses of the form : T(_,=,X3).

In fact a rule represents generally a relation: for each set of values for the variables in the antecedents there is a set of values for the consequent, while the consequent may contain (a) variable(s) not present in the antecedents. If the consequent does not contain variables different from those in the antecedents then the rule represents a function (from the domain of  triplesets to the domain of triplesets).

 

6.5.Completeness and soundness of the engine

 

The algorithm used in the engine is a resolution algorithm. Solutions are searched by starting with the axiom file and the negation of the lemma(query). When a solution is found a constructive proof of the lemma has also been found in the form of a contradiction that follows from a certain number of traceable unification steps as defined by the theory for first order resolution. Thus the soundness of the engine can be concluded.

In general completeness of the engine will be impossible. It is an engine for verification of proofs; no assurance can be given if a certain proof is refused about the validity of the proof. If it is accepted the garantuee is given of its correctness.

The unification algorithm clearly is decidable as a consequence of the simple ternary structure of N3 triples. The unification algorithm in the module N3Unify.hs can be seen as a proof of this.

 

 

6.6.RDFProlog

 

I have defined a prolog-like structure by the module RDFProlog.hs (for a sample see chapter 7. applications) where e.g. name(company,”Test”) is translated to the triple: (company,name,”Test”). This language is very similar to the language DATALOG that is used for a deductive access to relational databases [ALVES].

The alphabet of DATALOG is composed of three sets: VAR,CONST and PRED denoting respectively variables, constants and predicates. One notes the absence of functions. Also predicates are not nested.

In RDFProlog the general  format of a rule is:

 

predicate-1(subject-1,object-1),…, predicate-n(subject-n,object-n) :>

predicate-q(subject-q,object-q), …,predicate-z(subject-z,object-z).

 

where all predicates,subjects and objects can be variables.

The format of a fact is:

 

predicate-1(subject-1,object-1),…, predicate-n(subject-n,object-n).

 

where all predicates,subjects and objects can be variables.

This is a rule without antecedents.

Variables begin with capital letters.

All clauses are to be entered on one line. A line with a syntax error is neglected. This can be used to add comment.

The parser is taken from [JONES] with minor modifications.

 

6.7. Comparison with PROLOG and OTTER

 

I will compare Prolog with RDFProlog. Differences are:

1)     in RDFProlog more than one head is allowed to a clause:

     a,b :> c,d. This is not permitted in Prolog

2)     predicates can be variables in RDFProlog:

     P(A,B),P(B,C) :> P(A,C).

     This is not permitted in Prolog.

These two differences are not very fundamental as anyhow using those features does not produce very good programs. Especially point 2 can very easily lead to combinatorial explosion.

3) RDFProlog is 100% declarative. Prolog is not as the sequence of declarations has importance in Prolog.

     4) RDFProlog does not have functions; it does not have nested predicates. The arity of a predicate can be 0,1 or 2. Higher arities seem possible but more study is necessary. This is mostly of practical importance: nested predicates can always be replaced by  non nested predicates; a function  can be replaced by replacing the predicate where the function occurs by a predicate with one extra term. Predicates with arities above 2 can be replaced by a series of predicates with arity 0,1 or 2.

 

Otter is a different story [WOS]. Differences are:

1)     Otter is a full blown theorem prover: it uses a set of different strategies that can be influenced by the user as well as different resolution rules:hyperresolution, binary resolution, …. It alos uses equality reasoning.

2)     Otter is a FOL reasoner. Contrary to RDFProlog it is not a subset of FOL. It reduces expressions in FOL to conjunctive normal form .

3)     Lists and other builtins are available in Otter.

 

Prolog and Otter work with clauses. They do not however consider sets of clauses and the matching of sets. As shown in chapter 3 in order to realize the matching of sets of clauses an inference engine is necessary that calls another embedded inner engine. The outer inference engine does the inferencing on sets of clauses (in the case of RDF on triplesets). The inner engine does the matching of two sets of clauses that form a graph.

For implementing RDF inferencing in Prolog and Otter the inner engine must be programmed. However also the unification process does have to be programmed as this is not standard. But then there is no way to unify the sets of clauses without programming so that also the outer engine does have to be programmed. Finally, compared to Haskell the same amount of programming would be necessary without any further advantage. On the contrary Haskell certainly seems more apt as a programming language for implementing this system. 

 

 

6.8.Differences between RDF and FOL

 

Take the following declarations in FOL:

$ car: owns(John,car) & brand(car,Ford)

$ car:brand(car,Opel)

 

and the following RDF declarations:

(John,owns,car)

(car,brand,Ford)

(car,brand,Opel)

 

In RDF this really means that John owns a car that has two brands (see also the following chapter). This is the consequence of the graph model of RDF. A declaration as in FOL where the atom car is used for two brands is not possible in RDF. In RDF things have to be said a lot more explicitly:

(John,owns,Johns_car)

(Johns_car,is_a,car)

(car,has,car_brand)

(Ford,is_a,car_brand)

(Opel,is_a,car_brand)

(car_brand,is_a,brand)

This means that it is not possible to work with anonymous entities in RDF.

On the other hand if the declaration:

$ car:color(car,yellow)

it is not possible to say whether this is Johns car or not.

In RDF:

color(Johns_car,yellow)

So here it is necessary to add to the FOL declaration:

$ car:color(car,yellow) & owns(John,car).

 

‘not’ and ‘or’

 

Negation by failure means that, if a fact cannot be proved, then the fact is assumed not to be true [ALVES]. In a closed world assumption this negation by failure is equal to the logical negation. In the internet an open world assumption is the rule and negation by failure should be simply translated as: not found.

It is possible to define a not as a property. Ø owner(John,Rolls) is the implemented as: not_owner(John,Rolls).

I call this negation by declaration. Contrary to negation by failure negation by declaration is monotone and corresponds to a FOL declaration.

Suppose the FOL declarations:

Øowner(car,X)à poor(X).

owner(car,X)àrich(X).

Øowner(car,Guido).

and the query:

poor(Guido).

In RDFProlog (see chapter 3 for a description of RDFProlog) this becomes:

not_owner(car,X) :> poor(X).

owner(car,X) :> rich(X).

not_owner(car,Guido).

and the query:

poor(Guido).

The negation is here really tightly bound to the property. The same notation as for FOL could be used in RDFProlog with the condition that the negation is bound to the property.

Now lets consider the following problem:

poor(X) :> not_owner(car,X).

rich(X) :> owner(car,X).

poor(Guido).

and the query:

owner(car,Guido).

My engine does not give an answer to this query. This is the same as the answer ‘don’t know’. A forward reasoning engine would find: not_owner(car,Guido). However when the query is: not_owner(car,Guido) the answer is ‘yes’. SO the response really is: not_owner(car,Guido). So, when a query fails, the engine should then try the negation of the query. When that fails also, the answer is really: ‘don’t know’.

 

The same procedure can be followed with an or. I call this or by declaration.

Take the following FOL declaration: color(yellow) Ú  color(blue).

With or by declaration this becomes:

color(yellow_or_blue).

color(yellow),color(not_blue) à color(yellow_or_blue).

color(not_yellow),color(blue) à color(yellow_or_blue).

color(yellow),color(blue) à color(yellow_or_blue).

Note that a declarative not must be used and that three rules have to be added. An example of an implementation of these features can be found in chapter 7,first example.

This way of implementing a disjunction can be generalized. As an example take an Alpine club. Every member of the club is or a skier, or a climber. This is put as follows in RDFProlog:

property(skier).

property(climber).

class(member).

subPropertyOf(sports,skier).

subPropertyOf(sports,climber).

type(sports,or_property).

climber(John).

member(X) :> sports(X).

subPropertyOf(P1,P),type(P1,or_property),P(X) :> P1(X).

 

The first rule says: if X is a member then X has the property sports.

The second rule says: if a property P1 is an or_property  and P is a subPropertyOf P1 and X has the property P then X also has the property P1.

The query: sports(John) will be answered positively.

The implementation of the disjunction is based on the creation of a higher category which is really the same as is done in FOL:

"x:skier(X) Ú climber(X),

only in FOL  the higher category remains anonymous.

A conjunction can be implemented in the same way. This is not necessary as conjunction is implicit in RDF.

The disjunction could also be implemented using a builtin e.g.:

member(club,X) :> or(X,list(skier,climber))

Here the predicate or has to be interpreted by the inference engine, that needs also a buitlin list predicate. The engine should then verify whether X is a skier or a climber or both.

 

Proposition:

I propose to extend RDF with a negation and a disjunction in a constructive way:

Syntactically the negation is represented by a negation sign in front of a predicate in RDFProlog:

Øa( b, c).

and in Notation 3 by a negation sign in front of a triple:

Ø{:a :b :c.}.

Semantically this negation is tied to the property and is identical to the creation of an extra property not_b that has the meaning of the negation of b. Such a triple is only true when it exists or can be deduced by rules.

Syntactically the disjunction is implemented with a special ‘builtin’ predicate or.

In RDFProlog if the or is applied to the subject:

p(or(a,b),c).

or to the object:

p(a,or(b,c)).

or to the predicate:

or(p,p1)(a,b).

and in Notation 3:

{:a :p or(:b,:c)}.

In RDFProlog this can be used in place of the subject or object of a triple; in Notation 3 in place of subject, property or object.

Semantically in RDFProlog the construction p(a,or(b,c))  is true if there exists a  triple p(a,b) or there exists a triple p(a,c).

Semantically in Notation 3 the construction {:a :p or(:b,:c)}. is true when there  exists a triple {:a :p :b} or there exists a triple {:a :p :c}.

It is my opinion that these extensions will greatly simplify the translation of logical problems to RDF while at the same absolutely not modifying the data model of RDF.

 

Conclusion

 

RDF is more ‘constructive’ than FOL. Resources have to exist and receive a name. Though a resource can be anonymous in the theory, in practical inferencing it is necessary to give it an anonymous name.

A negation cannot just be declared; it needs to receive a name. Saying that something is not yellow amounts to saying that it has the property not_yellow.

The set of all things that have either the property yellow or blue or both needs to be declared in a superset that has a name.

For a proof of A Ú ØA either A has to be proved or ØA has to be proved (e.g. an object is yellow or not_yellow only is this has been declared).   

This is the consequence of the fact that in the RDF graph everything is designated by a URI.

The BHK-interpretation of constructive logic states:

(Brouwer, Heyting, Kolmogorov) [STANDARD]

a)     A proof of A and B is given by presenting a proof of A and a proof of B.

b)    A proof of A or B is given by presenting either a proof of A or a proof of B or both.

c)     A proof of A à B is a procedure which permits us to transform a proof of A into a proof of B. By the closure procedure described in chapter 3 implication is really constructive because it is defined as a construction: add the consequents of the rule to the graph!

d)    The constant false has no proof.

If by proof of A is understood that A is a valid triple (the URI’s of the triple are existing), then RDF as described above follows the BHK interpretation.  Point c follows from the definition of a rule as the replacement of a subgraph defined by the antecedents by a subgraph defined by the consequents. The constant false does not exist in RDF.

 

The Semantic Web and logic

 

One of the main questions to consider when speaking about logic and the Semantic Web is the logical difference between an open world and  a closed world.

 

In a closed world it is assumed that everything is known. A query engine for a database e.g. can assume that all knowledge is in the database. Prolog also makes this assumption. When a query cannot be answered Prolog responds: “No”, where it could also say: ”I don’t know”.

In the world of the internet it is not generally possible to assume that all facts are known. Many data collections or web pages are made with the assumption that the user knows the data are not complete.

Thus collections of data are considered to be open. There are some fundamental implications of this fact:

1)     it is impossible to take the complement of a collection as: a) the limits of the collection are not known and b) the limits of the universe of discourse are not known.

2)     Negation can only be applied to known elements. It is not possible to speak of all resources that are not yellow. It is possible to say that resource x is not yellow or that all elements of set y are not yellow.

3)     The universal quantifier does not exist. It is impossible to say : for all elements in a set or in the universe as the limits of those entities are not known. The existential quantifier, for some, really means :  for all those that are known.

4)     Anonymous resources cannot be used in reasoning. Indeed, in RDF it is possible to declare blank nodes but when reasoning these have to be instantiated by a ‘created URI’. 

5)     The disjunction must be constructive. A proof of a or b is a proof of a, a proof of  b or a proof of a and a proof of b. In a closed world it can always be determined whether a or b is true.

 

An example

 

Take the logic statements in first order logic:

"x$y:man(x) & woman(y) & loves(x,y)

$x"y:man(x) & woman(y) & loves(x,y)

The first means: ‘all man love a woman’ and the second: ‘some man love all woman’. Clearly this is not the same meaning. However, constructively (as implemented in my engine) they are the same and just represent all couples (x,y) for which a fact: man(x) & woman(y) & loves(x,y) can be found in the RDF graph. In order to maintain the meaning as above it should be indicated that reasoning is in a closed world. This is the same as stating that man and  woman are closed sets i.e. it is accepted that all members of the set are (potentially) known.

 

It is possible to define a logic for the World Wide Web(open world), it is possible to define a logic for a database application(closed world), it is possible to define a logic for a philosophical treatise(perhaps an epistemic logic),… What I want to say is: a logic should be defined within a certain context.

So I want to define a logic in the context ‘reference inference engine’ together with a certain RDF dataset and RDF ruleset. Depending on the application this context can be further refined. One context could then be called the basic RDF logic context. In this context the logic is constructive with special rules regarding sets. Thus the Semantic Web needs a system of context dependent logic.

It is then possible to define for the basic context:

A closed set is a set for which it is possible to enumerate all elements of the set in a finite time and within the context: the engine,the facts and the rules. An open set is the a set that is not closed. With this definition then e.g. the set of natural numbers is not closed. In another, more mathematical context, the set of natural numbers might be considered closed.

 

Another aspect of logics that is very important for the Semantic Web is the question whether the logic should be monotonic or  nonmonotonic.

Under monotonicity in regard to RDF I understand that, whenever a query relative to an RDF file has a definite answer and another RDF file is merged with the first one , the answer to the original query will still be obtained after the merge. As shown before  there might be some extra answers and some of those might be contradictory with the first answer. However this contradiction will clearly show.

If the system is nonmonotonic some answers might dissappear; other that are contrdictory with the frist might appear. All this would be very confusing.

 

As was shown above the logic of RDF is compatible with these requirements. They define a kind of constructive logic. I argue that this logic must be the basic logic of the Semantic Web. Basic Semantic Web inference engines should use a logic that is both constructive, based on an open world assumption,  and monotonic.

If, in some applications, another logic is needed, this should be clearly indicated in the RDF datastream. In this way basic engines will ignore these datastreams.

 

Aspects of monotonicity

 

Problem

 

I state the problem with an example.

A series of triples:

(item1,price,price1),(item2,price,price2), etc…

is given and a query is done asking for the lowest price.

Then some more triples are added:

(item6,price,price6),(item7,price,price7), etc…

and again the query is done asking for the lowest price.

Of course, the answer can now be different from the first answer. Monotonicity as I defined it in … would require the first answer to remain in the set of solutions. This obviously is not the case.

Clearly however, a Semantic Web that is not able to perform an operation like the above, cannot be sactisfactory.

 

The solution

 

I will give an encoding of the example above in RDFProlog and then discuss this further.

 

List are encoded with rdf:first  for indicating the head of a list and rdf:rest for indicating the rest of the list. The end of a list is encoded with rdf:nil.

 

I suppose the prices have already been entered in a list what is in any case necessary.

 

rdf:first(l1,”15”).

rdf:rest(l1,l2).

rdf:first(l2,”10”).

rdf:rest(l2,rdf:nil).

 

rdf:first(L,X), rdf:rest(L,rdf:nil) :> lowest(L,X).

rdf:first(L,X), rdf:rest(L,L1), lowest(L1,X1), lesser(X,X1) :> lowest(L,X).

rdf:first(L,X), rdf:rest(L,L1), lowest(L1,X1), lesser(X1,X) :> lowest(L,X1).

 

The query is:

lowest(l1,X).

 

giving as a result:

lowest(l1,”10”).

Let’s add two prices to the list. The list then becomes:

 

rdf:first(l1,”15”).

rdf:rest(l1,l2).

rdf:first(l2,”10”).

rdf:rest(l2,l3).

rdf:first(l3,”7”).

rdf:rest(l3,l4).

rdf:first(l4,”23”).

rdf:rest(l4,rdf:nil).

 

The query will no give as a result:

lowest(l1,”7”).

 

Monotonicity is not respected. However, from the first RDF graph a triple has disappeared:

rdf:rest(l2,rdf:nil).

 

So the conditions for monotonicity as I defined in … are not respected either.

 

Conclusion

 

Some operations cannot be executed while at the same time respecting monotonicity. Especially, monotonicity cannot be expected when a query is repeated on a graph that does not contain the first graph as a subgraph. Clearly, however, these operations might be necessary. A check is necessary in every case to see whether this breaking of monotonicity is permitted.

 

 

 


Chapter 7. Existing query and inference engines

 

7.1. Introduction

 

After discussing the characteristics of RDF inferencing it is time to review what exists already.

I make a difference between a  query engine that just does querying on a RDF graph but does not handle rules and an inference engine that also handles rules.

In the literature this difference is not always so clear.

The complexity of an inference engine is a lot higher than a query engine. This is because rules create an extension of the original graph producing a closure graph (see chapter 5). In the course of a query this closure graph or part of it has to be reconstructed. This is not necessary in the case of a query engine.

Rules also suppose a logic base that is inherently more complex than the logic in the situation without rules. For a query engine only the simple principles of entailment on graphs are necessary (see chapter 5).

 

7.2. Inference engines

 

7.2.1.Euler

 

Euler is the program made by Deroo [DEROO]. It does inferencing and also implements a great deal of OWL (Ontology Web Language).

The program reads one or more triple databases that are merged together and it also reads a query file. The merged databases are transformed into a linked structure (Java objects that point to other Java objects). The philosopy of Euler is thus graph oriented.

This structure is different from the structure of my engine. In my engine the graph is not described by a linked structure but by a collection of triples, a tripleset.

The internal mechanism of Euler is in accordance with the principles exposed in chapter 5.

Deroo also made a collection of test cases upon which I based myself for testing my engine.

 

7.2.2. CWM

 

CWM is the program of Berners-Lee.

It is based on forward reasoning.

CWM makes a difference between existential and universal variables[BERNERS]. N3Engine does not make that difference. It follows the syntaxis of Notation 3: log:forSome for existential variables, log:forAll  for universal variables. As I explained in chapter 6 all variables in my engine are quantified in the sense: forAllThoseKnown. Blank or anonymous nodes are instantiated with a unique URI.

CWM makes a difference between ?a  for a universal, local variable and _:a for an existential, global variable. N3Engine maintains the difference between local and global but not the quantification. 

7.2.3. SiLRI

7.2.4. TRIPLE

 

TRIPLE is based on Horn logic and borrows many features from F-Logic [SINTEK].

TRIPLE is the successor of SiLRI.

 

7.3. Query engines

 

7.3.1. Versa

7.3.2. RDF Query from IBM

7.3.3. SquishQL from HP

7.3.4. RDQL from HP

7.3.5. RQL from ICS-FORTH

 


Chapter 7. Optimization.

 

 

7.1.Introduction

 

Optimization can be viewed from different angles.

1) Optimization in forward reasoning is quit different from optimization in backward reasoning (resolution reasoning). I will put the accent on resolution reasoning.

2) The optimization can be considered on the technical level i.e. the performance of the implementation is studied. Examples are: use a compiled version instead of an interpreter, use hash tables instead of sequential access, avoid redundant manipulations etc… Though this point seems less important than the following point (2) I will propose nevertheless two optimizations which are substantial.

3) Reduce as much as possible the number of steps in the inferencing process. Here it is not only tried to stop the combinatorial explosion but it is tried to minimize the number of unification loops.

          There are two further aspects:

a)     Only one solution is needed e.g. in an authentication process the question is asked whether the person is authenticated or not.

b)    More than one solution is needed.

In general the handling of these two aspects is not completely equal.

     

7.2. Discussion.

 

I will use the above numbering for referring to the different cases.

A substantial lot of research about optimization has been done in connection with DATALOG (see chapter 6 for a description of DATALOG).

1)     Forward reasoning in connection with RDF means the production of the closure graph. I recall that a solution is a subgraph of the closure graph. If e.g. this subgraph contains three triples and the closure involves the addition of e.g. 10000 triples then the efficiency of the process will not be high. The efficiency of the forward reasoning or the bottom-up approach as it is called in the DATALOG world is enhanced considerably by using the magic sets technique [YUREK]. Take the query: grandfather(X,John) asking for the grandfather of John. With the magic sets technique only the ancestors of John will be considered. This is done by rewriting the query and replacing it with more complex predicates that restrict the possible matches. According to some [YUREK] this makes it better than top-down or backwards reasoning.

2)     Technical actions for enhancing efficiency. I will only mention here two actions which are specific for the inference engine I developed.

 

2a) As was mentioned higher, the fact that a predicate is always a variable or an URI but never a complex term, permits a very substantial optimization of the engine. Indeed all clauses(triples) can be indexed by their predicate. Whenever a unification has to be done between a goal and the database, all relevant triples can be looked up with the predicate as an index. Of causes the match will always have to be done with clauses that have a variable predicate. When this happens in the query then the optimization cannot be done. However such a query normally is also a very general query. In the database however variable predicates occur mostly in general rules like the transitivity rule. In that case it should be possible to introduce a typing of the predicate and then add the typing to the index. In this way, when searching for alternative clauses it will not be necessary to consider the whole database but only the set of selected triples. This can further be extended by adding subjects and objects to the index. Eventually a general typing system could be introduced where a type is determined for each triple according to typing rules (who can be expressed in notation 3 for use by the engine).

When the triples are in a database (in the form of relational rows) the collection of the alternatives can be done ‘on the fly’  by making an (SQL)-query to the database. In this way part of algorithm can be replaced by queries to a database.

 

2b) The efficiency of the processing can be enhanced considerably by working with numbers instead of with labels. A table is made of all possible labels where each label recieves a number. A triple is then represented by three integer numbers e.g. (12,73,9). Manipulation (copy,delete) of a triple becomes faster; comparison is just the comparison of integers which should be faster than comparing strings.

 

3)  From the graph interpretation it can be deduced that resolution paths must be searched that are the reverse of closure paths. This implies that a match must be found with grounded triples i.e. first try to match with the original graph and then with the rules. This can be beneficiary for as well the case (3a) (one solution) as (3b) (more than one solution). In the second case it is possible that a failure happens sooner.

It is possible to order the rules following this principle also: try first the rules that have as much grounded resources in their consequents as possible; rules that contain triples like (?a,?p,?c) should be used in last instance.

The entries in the goallist can be ordered following this principle also: put first in the goallist the grounded entries, then sort the rest following the number of grounded resources.

This principle can be used also in the construction of rules: in antecedents and consequents put the triples with the most grounded resources first.

In general in rules triples with variable certificates should be avoided. These reduce the efficiency because a lot of triples will unify with these.

The magic set transformation can also be used in backwards reasoning [YUREK].This gives a significant reduction in the number of necessary unifications.

 

An interesting optimization is given by the Andorra principle [JONES]. It is based on selection of a goal from the goallist. For each goal the number of matches with the database is calculated. If there is a goal that does not match there is a failure and the rest of the goals can be forgotten. Then for the next goal take the goal with the least number of matches.

 

Prolog and my engine take a depth first search approach. Also possible is a breadth first search. In the case only one solution is needed, in general, breadth first search should be more appropriate certainly if the solution tree is very imbalanced. If all solutions must be found then there is no difference. Looping can easily be stopped in breadth first search while a tree is kept and recurrence of a goal can be easily detected.

 

Typing can substantially reduce the number of unifications. Take e.g. the rule:

{(?p,a,transitiveProperty),(?a,?p,?b),(?b,?p,?c)} implies {(?a,?p,?c)}

 

Without typing any triple will match with the consequent of this rule. However with typing, the variable ?p will have the type transitiveProperty so that only resources that have the type transitiveProperty or a subtype will match with ?p.

See also the remarks about typing at the end of chapter 5.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 Chapter 8. Inconsistencies

 

8.1. Definition

 

Two triplesets are inconsistent when:

1)     by human judgement they are considered to be inconsistent

2)     or, by computer judgement, they are considered to be inconsistent. The judgement by a computer is done using logic. In the light of the RDF gtraph theory a judgement of inconsistency implies a judgement of  invalidity of the RDF graph.

 

8.2. Elaboration

 

Take an object. An object can have only one color in one application; in another application it can have more than one color. In the first case color can be declared to be an owl:uniqueProperty. This property imposes a restriction on the RDF graph: e.g. the triples T(object,color,red) and T(object,color,blue) cannot occur at the same time. If they do the RDF graph is considered to be invalid. This is the consequence of an implemented logic restriction on an RDF graph:

"subject,property,object,T1,T2, G: T1(object,color,color1) & T2(object,color,color2) & not equal(color1,color2) & element(G,T1) & element(G,T2) à invalid(G)

However in another application this restriction does not apply and an object can have two colors. This has following application: inconsistencies are application dependent . Inconsistencies are detected by applying logic restrictions  to two triplesets. If the restriction applies the RDF graph is considered to be invalid. The declaration and  detection of inconsistencies is dependent on the logic that is used in the inferencing system.

 

When a closed world logic is applied, sets have boundaries. This implies that inconsistencies can be detected that do not exist in an open world.

An example: there is a set of cubes with different colors but none yellow. Then the statement: ‘cube x is yellow’ is an inconsistency. This is not true in an open world. 

My inference engine does not detect any inconsistencies at all. It can however handle applications that detect inconsistencies. Indeed, as said above, this is only a matter of implementing restrictions e.g. with the rule:

{(cubea,color,yellow)}implies{(this,a,inconsistency)}.

If I have an application that does not permit yellow cubes.

 

Nevertheless,  tehre might be also inherent logic inconsistencies or contradictions. These however are not possible in basic RDF. By adding a negation, as proposed in chapter 6, these contradictins can arise e.g.

In RDFProlog:

color(cube1,yellow) and not(color(cube1,yellow)).

The more logic is added the more of these kind of contradictions can arise. The implementation of an ontology like OWL is an example.

 

8.3. OWL and inconsistencies ** to write **

 

When rules and facts originating form different sites on the internet are merged together into one database, inconsistencies might arise. Two categories can be distinguished:

1)     inconsistencies provoked with malignous intent

2)     unvoluntary inconsistencies

 

Category 1: there are, of course a lot of possibilities, and hackers are inventive. Two possibilities:

a)     the hacker puts rules on his site or a pirated site which are bound to give difficulties e.g. inconsistencies might already be inherent in the rules.

b)    The namespace of someone else is ursurpated i.e. the hacker uses somebody elses namespace(s).

 

Category 2: unvoluntary errors.

a)     syntax errors: these should be detected.

b)    logical errors: can produce faulty results or contradictions. But what can be done against a bad rule e.g. a rule saying all mammals are birds? A contradiction will only arise when there is a rule (rules) saying that mammals are not birds or this can be concluded.

This is precisely one of the goals of an ontology: to add restrictions to a description. These restrictions can be checked. If it is declared that an object has a minimum size and a maximum size and somebody describes it with a size greater than the maximum size the ontology checker should detect this.

 

 

When there are inconsistencies detected it is important to trace the origin of the inconsistencies (which is not necessarily determined by the namespace). This means all clauses in the database have to be marked. How can the origin be traced back to an originating site?

In this context namespaces are very important. The namespace of to which RDF triples belong identify their origin. However a hacker can falsify namespaces. Here again the only way to protect namespaces is by use of cryptographic methods.

 

When two contradictory solutions are found or two solutions are found and only one is expected, a list of the origins of the clauses used to obtain the solutions can eventually reveal a difference in origin of the clauses. When no solution is found, it is possible to suspect first the least trusted site. How to detect who is responsible?

 

I belief truth on the Semantic Web is determined by trust. A person believes the sites he trusts. It is therefore fundamental in the Semantic Web that every site can be identified unequivocally. This is far from being the truth on the present web, leaving the door open for abuses.

Each person or automated program should dispose of a system that ranks the ‘trustworthiness’ of sites, ontologies or persons.

 

Heflin [HEFLIN] has some ideas on inconsistency. He gives the example: if someone states ‘A’ and another one states ‘not A’ this is a contradiction and all conclusions are possible. However this is not possible in RDF as a ‘not’ does not exist. Somebody might however claim: ‘X is black’ and another one ‘X is white’. This however will not exclude solutions; it will not lead to whatever conclusion; it just can add extra solutions to a query. Those solutions might be semantically contradictory for a human interpreter. If such is not wished ontologies have to be used. If ontologies produce such a contradiction  then, of course, there is an incompatibility problem between ontologies.

As a remedy against inconsistency Heflin cites nonmonotonicity. He says:”Often, an implicit assumption in these theories is that the most recent theory is correct and that it is prior beliefs that must change”. This seems reasonable for a closed world; however on the internet is seems more the contrary that is reasonable. As Heflin says: “if anything, more recent information that conflicts with prior beliefs should be approached with skepticism.”. and:

“The field of belief revision focuses on how to minimize the overall change to a set of beliefs in order to incorporate new inconsistent information.”.

What an agent knows is called his epistemic state. This state can change in three ways: by expansion, by revision and by contraction. An expansion adds new non-contradictory facts, a revision changes the content and a retraction retrieves content. But do we, humans, really maintain a consistent set of beliefs? Personally, I don’t think so.

Heflin speaks also about assumption-based truth maintenance systems (ATMS) . These maintain multiple contexts where each context represent a set of consistent assumptions.

 

The advantage of RDF is that ‘no’ and ‘or’ cannot be expressed which ensures the monitonicity when merging RDF files. However this is not anymore the case with OWL, even light OWL.

 

Conclusion

 

It seems to be too soon to attack seriously the field on ‘inconsistency’ in relation to the Semantic Web. However from the previous remarks I belief it is rather clear that it will represent a serious problem.

 

 

 

 

 


Chapter 9. Applications.

 

9.1. Test cases

 

 

9.1. An Alpine club

 

Following example was taken from [CLUB].

Brahma, Vishnu and Siva are members of an Alpine club.

Who is a member of the club is either a skier or a climber or both.

A climber does not like the rain.

A skier likes the snow.

What Brahma likes Vishnu does not like and what Vishnu likes, Brahma does not like.

Brahma likes snow and rain.

Which member of the club is a climber and not a skier?

 

This case is illustrative because it shows that ‘not’ and ‘or’ can be implemented by making them properties. If this is something handy to do remains to be seen but perhaps it can be done in an automatic way. Possibly also all alternatives could be put in a list.

 

Source:

member(brahma,club).

member(vishnu,club).

member(siva,club).

member(X,club) :> or_skier_climber(X,X).

or_skier_climber(X,X),not_skier(X,X) :> climber(X,X).

or_skier_climber(X,X),not_climber(X,X) :> skier(X,X).

skier(X,X),climber(X,X) :> or_skier_climber(X,X).

skier(X,X),not_climber(X,X) :> or_skier_climber(X,X).

not_skier(X,X),climber(X,X) :> or_skier_climber(X,X).

 

climber(X,X) :> not_likes(X,rain).

likes(X,rain) :> no_climber(X,X).

skier(X,X) :> likes(X,snow).

not_likes(X,snow) :> no_skier(X,X).

 

likes(brahma,X) :> not_likes(vishnu,X).

not_likes(vishnu,X) :> likes(brahma,X).

likes(brahma,snow).

likes(brahma,rain).

 

Query: member(X,club),climber(X,X),no_skier(X,X).

 

member(brahma,club).

member(vishnu,club).

member(siva,club).

 

member(X,club) :> or(skier,climber)(X,X).

climber(X,X) :> Ølikes(X,rain).

skier(X,X) :> likes(X,snow).

Ølikes(X,snow) :> Øskier(X,X).

likes(bhrama,X) :> Ølikes(vishnu,X).

Ølikes(vishnu,X) :> Ølikes(brahma,X).

likes(brahma,snow).

likes(brahma,rain).

 

Query: member(X,club),climber(X,X), Øskier(X,X).


 

9.2. Simulation of logic gates.

 

 

To completely calculate this example I had to change manually the input values and calculate the result two times, once for the output ‘sum’ and once for the output ‘cout’. This should be automated but in order to do so a few builtins should be necessary.

 

The circuit represents a halfadder. This is a typical example of a problem that certainly can better be handled with forward reasoning.

 

The source

 

definition of a half adder (from Wos Automated reasoning)[WOS]

translated from Otter to RDF

 

definition of a nand gate

 

gnand1(G,A),gnand2(G,B),gnand3(G,C),nand1(A,"false"),nand2(B,"false") :> nand3(C,"true").

gnand1(G,A),gnand2(G,B),gnand3(G,C),nand1(A,"false"),nand2(B,"true") :> nand3(C,"true").

gnand1(G,A),gnand2(G,B),gnand3(G,C),nand1(A,"true"),nand2(B,"false") :> nand3(C,"true").

gnand1(G,A),gnand2(G,B),gnand3(G,C),nand1(A,"true"),nand2(B,"true") :> nand3(C,"false").

 

definition of the connection between gates

 

nandnand2(G1,G2),gnand3(G1,X),gnand2(G2,X),nand3(X,V) :> nand2(X,V).

nandnand1(G1,G2),gnand3(G1,X),gnand1(G2,X),nand3(X,V) :> nand1(X,V).

 

The facts of the example:

 

gnand1(g1,a).

gnand2(g1,b).

gnand3(g1,a11).

nand1(a,"true").

nand2(b,"true").

nandnand1(g1,g3).

nandnand2(g1,g2).

nandnand2(g1,g9).

 

gnand1(g2,a).

gnand2(g2,a11).

gnand3(g2,a12).

nandnand1(g2,g4).

 

gnand1(g3,a11).

gnand2(g3,b).

gnand3(g3,a13).

nandnand2(g3,g4).

 

gnand1(g4,a12).

gnand2(g4,a13).

gnand3(g4,a14).

nandnand1(g4,g6).

nandnand1(g4,g5).

 

gnand1(g5,a14).

gnand2(g5,cin).

gnand3(g5,a15).

nand2(cin,"true").

nandnand1(g5,g8).

nandnand1(g5,g9).

nandnand2(g5,g6).

 

gnand1(g6,a14).

gnand2(g6,a15).

gnand3(g6,a16).

nandnand1(g6,g7).

 

gnand1(g7,a16).

gnand2(g7,a17).

gnand3(g7,sum).

 

gnand1(g8,a15).

gnand2(g8,cin).

gnand3(g8,a17).

nandnand2(g8,g7).

 

gnand1(g9,a15).

gnand2(g9,a11).

gnand3(g9,cout).

 

The queries: nand3(sum,V).

                     nand3(cout,V).

 

9.3. A description of owls.

 

This case study shows that it is possible to put data into RDF format in a systematic way and then queries can easily be done on this data. A GUI system can be put on top for entering the data and for querying in a user friendly way.

A query windows can give e.g. lists of used labels.

 

#An owl ontology

 

#Following tries to make an ontology of owls using RDF, rdfs and owl. A small amount of #statements about owl (Ontology Web Language) is made also.

 

@prefix : <http://test/owl_ontology/>.

@prefix owl: <http://www.w3.org/2002/07/owl#>.

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix dc: <http://dublin_core#>.

 

:Owls a owl:Class.

:Owl  a owl:Ontology;

rdfs:comment "About owls from North-America.".

owl:Owl a owl:Ontology.

 

:habitat a rdf:Property.

:food a rdf:Property.

:noise a rdf:Property.

:color a rdf:Property.

 

owl:Great_Horned_owl rdfs:subClassOf owl:Owl;

owl:sameClassAs owl:ghowl;

dc:creator "Pat Hayes";

rdfs:comment "Is in fierce competition with other owls.".

 

:pellets a owl:Class.

 

:owl_family rdfs:subClassOf :Owls.

:Barn_Owls rdfs:subClassOf :owl_family.

:Barn_Owls owl:sameClassAs :Tytonidae.

:Typical_Owls rdfs:subClassOf :owl_family.

:Typical_owls owl:sameClassAs :Strigidae.

 

:number_of_species a owl:DatatypeProperty;

rdfs:range <http://www.w3.org/2000/10/XMLSchema#integer>.

:size a owl:DatatypeProperty;

rdfs:range <http://www.w3.org/2000/10/XMLSchema#integer>.

 

:Barn_Owls :number_of_species "10";

rdfs:comment "Data for North-America.".

 

:Great_Horned_Owl a :Typical_Owls; :size "2";

:noise "whoo, hoo-hoo";

rdfs:comment "Size is in feet.".

 

:Long_Eared_Owl :size "1.50";

rdfs:comment "Size is in feet.".

 

:Barn_Owl :length "13-20 inches";

:habitat "forest near open fields, farmland, towns";

:food "mice, crickets, gophers".

 

:Screech_Owl :food "rodents, snails, snakes and baby birds.";

:length "9 inches";

:habitat "forests, groves, towns, farms, parks.".

 

:Barred_Owl :habitat "Woodland";

rdfs:comment "The prototypical owls from fairy tails and picture books.".

 

:Great_Gray_Owl :habitat "Northern forests and the mountains.";

:length "16-23 inches3";

:color "Mottled brown, with narred throat and streaked underparts.";

:habitat "Wet woodlands, wooded swamps and floodplains.".

 

:Snowy_Owl :food "lemmings and rabbits";

:color "White with black flecks.";

:length "19-25 inches";

:habitat "tundra, brushy areas, prairies, dunes, marsh".

 

:Burrowing_Owl :habitat "holes in the ground near prairie dog towns";

:food "lizards, insects, prairie dogs".

 

:Pygmy_Owl :habitat "evergreen forests and wooded canyons";

:length "size of a sparrow".

 

:Elf_owl rdfs:comment "tiniest owl".

 

Example Queries: ?who :habitat ?what.

                  :Snowy_Owl :color ?what.


Conclusion

 

Fig. 3.1. gives an overview of the layers of the Semantic Web. The first three layers are now realized. The logic, proof and trust layers as well as the part for signatures have yet to be standardized. This thesis talks mostly about the logic layer. I argue that the logic should be a kind of constructive logic that has the property of monotonicity.

A theory is proposed based on the graph model. From the definition of rules as entities producing a closure graph, a clear interpretation can be given to a query and an answer (solution) to the query.

The specification of the engine in Haskell has been done with difficulties. Especially the subgraph matching of queries and subgoals with the database graph is a tricky process. Perhaps better algorithms that the one presented here in chapter 5 can be found in the future.

The examples presented in chapter 9 and the numerous testcases I have tested clearly show that RDF is expressive enough to handle complex logic problems though not always in the most clear or elegant way like the example with the ‘Alpine club’ shows.

From the comparison between RDF and DATALOG follows that RDF inferencing can be used with RDF data in relational database and thus that RDF is capable of working with large data sets.

The principle of the namespaces takes care of heterogeneity of data sources. Files from different origin can easily be merged. As Heflin shows [HEFLIN] problems of backwards compatibility are important. It remains to be seen how these will be handled. Inconsistencies can arise but will show due to the constructive characteristics of the logic.

I believe my engine can be a starting point for a ‘reference’ implementation  of a Semantic Web engine. I say starting point because rdfs and OWL should be an integral part of such an engine and I did not include these standards.

I hope that this thesis has given a small but significant contribution to the development of the Semantic Web.

Bibliography

 

[AIT] Hassan Aït-Kaci WAM A tutorial reconstruction .

[ADDRESSING] http://www.w3.org/Adressing/ .

[ALBANY]http://www.albany.edu/~gilmr/metadata/rdf.ppt

[ALVES] Alves, Datalog,programmation logique et négation,Université de tours.

[BERNERS] Semantic Web Tutorial Using N3,Berners-Lee e.a., May 20,2003, http://www.w3.org/2000/10/swap/doc/

[BUNDY] Alan Bundy , Artificial Mathematicians, May 23, 1996, http://www.dai.ed.ac.uk/homes/bundy/tmp/new-scientist.ps.gz

[CASTELLO] R.Castello e.a. Theorem provers survey. University of Texas at Dallas. http://citeseer.nj.nec.com/409959.html                                

[CENG] André Schoorl, Ceng 420 Artificial intelligence University of Victoria, http://www.engr.uvic.ca/~aschoorl/ceng420/

[CHAMPIN] Pierre Antoine Champin, 2001 – 04 –05, http://www710.univ-lyon1.fr/~champin/rdf-tutorial/node12.html

[CLUB] http://www.cirl.uoregon.edu/~iustin/cis571/unit3.html, University of Oregon

[COQ] Huet e.a. RT-0204-The Coq Proof Assistant : A Tutorial, http://www.inria.fr/rrt/rt-0204.html.

[CS]  http://www.cs.ucc.ie/~dgb/courses/pil/ws11.ps.gz             

[DAML+OIL] DAML+OIL (March 2001) Reference Description. Dan Connolly, Frank van Harmelen, Ian Horrocks, Deborah L. McGuinness, Peter F. Patel-Schneider, and Lynn Andrea Stein. W3C Note 18 December 2001. Latest version is available at http://www.w3.org/TR/daml+oil-reference.

[DECKER] Decker e.a., A query and Inference Service for RDF,University of KarlsRuhe.

[DENNIS] Louise Dennis Midlands Graduate School in TCS, http://www.cs.nott.ac.uk/~lad/MR/lcf-handout.pdf

[DESIGN]     http://www.w3.org/DesignIssues/ * Tim Berners-Lee’s site with his design-issues articles .                                     

[DEROO]     http://www.agfa.com/w3c/jdroo * the site of the Euler program

[DICK] A.J.J.Dick, Automated equational reasoning and the knuth-bendix algorithm: an informal introduction, Rutherford Appleton Laboratory Chilton, Didcot OXON OX11 OQ, http://www.site.uottawa.ca/~luigi/csi5109/church-rosser.doc/

[DONALD] http://dream.dai.ed.ac.uk/papers/donald/subsectionstar4_7.html

[GANDALF]  http://www.cs.chalmers.se/~tammet/gandalf *  Gandalf Home Page 

[GENESERETH] Michael Genesereth, Course Computational logic, Computer Science Department, Stanford University.

[GHEZZI] Ghezzi e.a. Fundamentals of Software Engineering Prentice-Hall 1991 .

[GUPTA] Amit Gupta & Ashutosh Agte, Untyped lambda calculus, alpha-, beta- and eta- reductions, April 28/May 1 2000, http://www/cis/ksu.edu/~stefan/Teaching/CIS705/Reports/GuptaAgte-2.pdf

[GRAPH] http://www.agfa.com/w3c/euler/graph.axiom.n3

[HARRISON] J.Harrison, Introduction to functional programming, 1997. 

[HEFLIN] J.D.Heflin, Towards the semantic web: knowledge representation in a dynamic,distributed environment, Dissertation 2001.

[HILOG]      http://www.cs.sunysb.edu/~warren/xsbbook/node45.html

[JEURING] J.Jeuring and D.Swierstra, Grammars and parsing, http://www.cs.uu.nl/docs/vakken/gont/

[JEURING1] Jeuring e.a. ,Polytypic unification, J.Functional programming, september 1998.

[KERBER] Manfred Kerber, Mechanised Reasoning, Midlands Graduate School in Theoretical Computer Science, The University of Birmingham, November/December 1999, http://www.cs.bham.ac.uk/~mmk/courses/MGS/index.html

[LAMBDA]    http://www.cse.psu.edu/~dale/lProlog/  * lambda prolog home page

[LINDHOLM] Lindholm, Exercise Assignment Theorem prover for propositional modal logics, http://www.cs.hut.fi/~ctl/promod.ps

[LOGPRINC] http://www.earlham.edu/~peters/courses/logsys/glossary.htm#m]

[MCGUINESS] Deborah McGuiness Explaining reasoning in description logics                          1966 Ph.D.Thesis

[MYERS] CS611 LECTURE 14 The Curry-Howard Isomorphism, Andrew Myers.

 [NADA] Arthur Ehrencrona, Royal Institute of Technology Stockholm, Sweden, http://cgi.student.nada.kth.se/cgi-bin/d95-aeh/get/umeng

[OTTER]       http://www.mcs.anl.gov/AR/otter/   Otter Home Page

[OWL Features] Feature Synopsis for OWL Lite and OWL. Deborah L. McGuinness and Frank van Harmelen. W3C Working Draft 29 July 2002. Latest version is available at http://www.w3.org/TR/owl-features/.

[OWL Issues] Web Ontology Issue Status. Michael K. Smith, ed. 10 Jul 2002.

[OWL Reference] OWL Web Ontology Language 1.0 Reference. Mike Dean, Dan Connolly, Frank van Harmelen, James Hendler, Ian Horrocks, Deborah L. McGuinness, Peter F. Patel-Schneider, and Lynn Andrea Stein. W3C Working Draft 29 July 2002. Latest version is available at http://www.w3.org/TR/owl-ref/.

[OWL Rules] http://www.agfa.com/w3c/euler/owl-rules

[PFENNING_1999] Pfenning e.a., Twelf a Meta-Logical framework for deductive Systems, Department of Computer Science, Carnegie Mellon University, 1999.

[PFENNING_LF] http://www-2.cs.cmu.edu/afs/cs/user/fp/www/lfs.html

[RDFM]        RDF Model Theory .Editor: Patrick Hayes                   <http://www.w3.org/TR/rdf-mt/>

[RDFMS] Resource Description Framework (RDF) Model and Syntax       Specification  http://www.w3.org/TR/1999/REC-rdf-syntax-19990222

[RDF Primer]  http://www.w3.org/TR/rdf-primer/]

[RDFSC] Resource Description Framework (RDF) Schema Specification                                                             1.0 <http://www.w3.org/TR/2000/CR-rdf-schema-20000327>

[SCHNEIER] Schneier Bruce, Applied Cryptography,

[SINTEK] Sintek e.a.,TRIPLE – A query, Inference, and Transformation Language for the Semantic Web, Stanford University.)

[STANFORD] Stanford Encyclopedia of philosophy - Automated

reasoning http://plato.stanford.edu/entries/reasoning-automated/

[SWAP/CWM]  http://www.w3.org/2000/10/swap                       http://infomesh.net/2001/cwm  CWM is another inference engine for the web .

[SWAPLOG] http://www/w3.org/200/10/swap/log#

[TBL] Tim-berners Lee, Weaving the web,

[TBL01] Berners-Lee e.a.The semantic web Scientific American May 2001

 [TWELF] http://www.cs.cmu.edu/~twelf* Twelf Home Page               

[UNCERT] Hin-Kwong Ng e.a. Modelling uncertainties in argumentation,                     Department of Systems Engineering & Engineering Management The Chinese University of Hong Kong http://www.se.cuhk.edu.hk/~hkng/papers/uai98/uai98.html

[UNICODE] http://www.unicode.org/unicode/standard/principles.html]

[UMBC] TimothyW.Finin, Computer Science and Electrical Engineering, University of Maryland Baltimore Country, http://www.cs.umbc.edu/471/lectures/9/sld040.htm

[URI] http://www.isi.edu/in-notes/rfc2396.txt

[USHOLD] Michael Ushold The boeing company, Where is the semantics of the web?http://cis.otago.ac.nz/OASWorkschop/Papers/WhereIsTheSemantics.pdf

[VAN BENTHEM]  Van Benthem e.a., Logica voor informatici,                     Addison Wesley 1991.

[WALSH] Toby Walsh, A divergence critic for inductive proof, 1/96, Journal of Artificial Intelligence Research, http://www-2.cs.cmu.edu/afs/cs/project/jair/pub/volume4/walsh96a-html/section3_3.html

[WESTER]    Wester e.a. Concepten van programmeertalen,Open Universiteit Eerste druk 1994 Important for a thorough introduction in Gofer . In Dutch .

[WOS] Wos e.a. Automated reasoning,Prentice Hall, 1984

[W3SCHOOLS http://www.w3schools.com/w3c/w3c_intro.asp]

[YUREK] Yurek , Datalog bottom-up is the trend in the deductive database evaluation strategy, University of Maryland, 2002.


 

List of abbreviations

 

ALF            :  Algebraic Logic Functional Programming Language

API             :  Application Programming Interface

CA              :  Certification Authority

CWM                   :  Closed World Machine

                       An experimental inference engine for the semantic web

DTD           :  Document Type Definition , a language for defining XML-                 

                         objects .

HTML        :  Hypertext Markup Language

KB              :  Knowledge Base

FOL            :  First Order Logic

N3              :  Notation 3

OWL          :  Ontology Web Language

PKI             :  Public Key Infrastructure

RA              :  Registration Authority

RDF           :  Resource Description Framework

RDFS                   :  RDF Schema   

SGML        :  Standard General Markup Language

SweLL        :  Semantic Web Logic Language

URI            :  Uniform Resource Indicator

URL           :  Uniform Resource Locator

W3C           :  World Wide Web Consortium

WAM         :  Warren Abstract Machine

                       Probably the first efficient implementation of prolog .

XML           :  Extensible Markup Language .

                       The difference with HTML is that tags can be freely defined in                                  

                              XML .

                         

 

 

 

 


Annexe I

 

Usage of the inference engine N3Engine.

First unzip the file EngZip.zip into a directory.

 

Following steps construct a demo (with the Hugs distribution under Windows):

 

1)     winhugs.exe

2)     :l RDFProlog.hs

3)     menu

4)     choose one of the examples e.g. en2. These are testcases with the source in Notation 3

5)     menuR

6)     choose one of the examples e.g. pr12. Those with number 1 at the end give the translation from RDFProlog to Notation3; those with number 2 at the end execute the program.

 


Annexe 2. An example of a resolution and a closure path made with the gedcom example (files gedcom.facts.n3, gedcom.relations.n3, gedcom.query.n3)

 

First the trace is given with the global substitutions and all calls and alternatives:

 

Substitutions:

 

{2$_$17_:child = :Frank. _1?who1 = 2$_$17_:grandparent. 3$_$16_:child = :Frank. 2$_$17_:grandparent = 3$_$16_:grandparent. 4$_$13_:child = :Frank. 3$_$16_:parent = 4$_$13_:parent. 4$_$13_:family = :Naudts_VanNoten. 4$_$13_:parent = :Guido. 7$_$13_:child = :Guido. 3$_$16_:grandparent = 7$_$13_:parent. 7$_$13_:family = :Naudts_Huybrechs. 7$_$13_:parent = :Pol. }.

N3Trace:

 

[:call {:Frank gc:grandfather _1?who1. }].

:alternatives = {{2$_$17_:child gc:grandparent 2$_$17_:grandparent. 2$_$17_:grandparent gc:sex :M. }

}

 

[:call {:Frank gc:grandparent 2$_$17_:grandparent. }].

:alternatives = {{3$_$16_:child gc:parent 3$_$16_:parent. 3$_$16_:parent gc:parent 3$_$16_:grandparent. }

}

 

[:call {:Frank gc:parent 3$_$16_:parent. }].

:alternatives = {{4$_$13_:child gc:childIn 4$_$13_:family. 4$_$13_:parent gc:spouseIn 4$_$13_:family. }

}

 

[:call {:Frank gc:childIn 4$_$13_:family. }].

:alternatives = {{:Frank gc:childIn :Naudts_VanNoten. }

}

 

[:call {4$_$13_:parent gc:spouseIn :Naudts_VanNoten. }].

:alternatives = {{:Guido gc:spouseIn :Naudts_VanNoten. }

{:Christine gc:spouseIn :Naudts_VanNoten. }

}

 

[:call {:Guido gc:parent 3$_$16_:grandparent. }].

:alternatives = {{7$_$13_:child gc:childIn 7$_$13_:family. 7$_$13_:parent gc:spouseIn 7$_$13_:family. }

}

 

[:call {:Guido gc:childIn 7$_$13_:family. }].

:alternatives = {{:Guido gc:childIn :Naudts_Huybrechs. }

}

 

[:call {7$_$13_:parent gc:spouseIn :Naudts_Huybrechs. }].

:alternatives = {{:Martha gc:spouseIn :Naudts_Huybrechs. }

{:Pol gc:spouseIn :Naudts_Huybrechs. }

}

 

[:call {:Pol gc:sex :M. }].

:alternatives = {{:Pol gc:sex :M. }

}

Query:

 

{:Frank gc:grandfather _1?who1. }

 

Solution:

 

{:Frank gc:grandfather :Pol. }

 

 

The calls and alternatives with the global substitution applied:

------------------------------------------------------------------

 

It can be immediately verified that the resolution path is complete.

(The alternatives that lead to failure are deleted).

 

[:call {:Frank gc:grandfather :Pol.}].

:alternatives = {{:Frank gc:grandparent :Pol. :Pol gc:sex :M. }

}

 

[:call {:Frank gc:grandparent :Pol. }].

:alternatives = {{:Frank gc:parent :Guido. :Guido gc:parent :Pol. }

}

 

[:call {:Frank gc:parent :Guido. }].

:alternatives = {{:Frank gc:childIn :Naudts_Vannoten. :Guido gc:spouseIn :Naudts_Vannoten. }

}

 

[:call {:Frank gc:childIn :Naudts_Vannoten. }].

:alternatives = {{:Frank gc:childIn :Naudts_VanNoten. }

}

 

[:call {:Guido gc:spouseIn :Naudts_VanNoten. }].

:alternatives = {{:Guido gc:spouseIn :Naudts_VanNoten. }

}

 

[:call {:Guido gc:parent :Pol. }].

:alternatives = {{:Guido gc:childIn :Naudts_Huybrechs. :Pol gc:spouseIn :Naudts_Huybrechs. }

}

 

[:call {:Guido gc:childIn :Naudts_Huybrechs. }].

:alternatives = {{:Guido gc:childIn :Naudts_Huybrechs. }

}

 

[:call {:Pol gc:spouseIn :Naudts_Huybrechs. }].

:alternatives = {{:Pol gc:spouseIn :Naudts_Huybrechs. }

}

 

[:call {:Pol gc:sex :M. }].

:alternatives = {{:Pol gc:sex :M. }

}

 

Query:

 

{:Frank gc:grandfather _1?who1. }

 

Solution:

 

{:Frank gc:grandfather :Pol. }

 

 

It is now possible to establish the closure path.

 

( {{:Guido gc:childIn :Naudts_Huybrechs. :Pol gc:spouseIn :Naudts_Huybrechs. }, {:Guido gc:parent :Pol. })

( {{:Frank gc:childIn :Naudts_Vannoten. :Guido gc:spouseIn :Naudts_Vannoten. }, {:Frank gc:parent :Guido. })

 

({{:Frank gc:parent :Guido. :Guido gc:parent :Pol. },

{:Frank gc:grandparent :Pol. })

 

({{:Frank gc:grandparent :Pol. :Pol gc:sex :M. },

{:Frank gc:grandfather :Pol.})

 

 

The triples that were added during the closure are:

{:Guido gc:parent :Pol. }

{:Frank gc:parent :Guido. }

{:Frank gc:grandparent :Pol. }

{:Frank gc:grandfather :Pol.}

 

It is verified that the reverse of the resolution path is indeed a closure path.

 

Verification of the lemmas

----------------------------

 

Closure Lemma: certainly valid for this closure path.

Query Lemma: the last triple in the closure path is the query.

Final Path Lemma: correct for this resolution path.

Looping Lemma: not applicable

Duplication Lemma: not applicable.

Failure Lemma: not contradicted.

Solution Lemma I: correct for this testcase.

Solution Lemma II: correct for this testcase.

Solution Lemma III: not contradicted.

Completeness Lemma: not contradicted.