An inference engine for RDF

 

 

 

Master thesis

 

 

 

 

G.Naudts

831708907

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

22 augustus 2003

Open University Netherlands

Agfa Gevaert

 

 

 

 

 

                

 

 


This document is the Master Thesis made as a part of my Master Degree in Computer Science Education (Software Systems) at the Open University of the Netherlands. The work has been done in collaboration with the research department of the company Agfa in Mortsel Belgium.

 

Student data

 

Name                          Guido Naudts

Student number           831708907

Address                       Secretarisdreef 5

                                    2288 Bouwel

Telephone work          0030-2-542.76.01

                  Home        0030-14-51.32.43

E-mail                         Naudts_Vannoten@yahoo.com

 

Coaching and graduation committee

 

Chairman:                    dr J.T. Jeuring, professor at the Open University

Secretary :                    ir. F.J. Wester, senior lecturer at the Open University

Coach:                          ir. J. De Roo, researcher at Agfa Gevaert.

 

 

 

 


Acknowledgements

I want to thank Ir.J.De Roo for giving me the opportunity for making a thesis about a tremendous subject and his guidance in the matter.

Pr.J.Jeuring and  Ir. F.Wester are thanked for their efforts to help me produce a readable and valuable text.

I thank M.P.Jones for his Prolog demo in the Hugs distribution that has very much helped me.

I thank all the people from the OU for their efforts during the years without which this work would not have been possible.

I thank my wife and children for supporting a father seated behind his computer during many years.

 


An inference engine for RDF.... 1

Master thesis. 1

G.Naudts. 1

Open University Netherlands. 1

Agfa Gevaert 1

Student data.. 3

Coaching and graduation committee. 3

Acknowledgements. 4

Summary. 13

Samenvatting.. 15

Chapter 1. Introduction.. 17

1.1.  Overview.... 17

1.2. Case study.. 17

1.2.1. Introduction.. 17

1.2.2. The case study.. 17

1.2.3.Conclusions of the case study.. 19

1.3.Research goals.. 19

1.3.1. Standards.. 19

1.3.2. Research tasks.. 20

1.4. Research method.. 21

1.5. The foundations.. 22

1.6. Relation with the current research.. 22

1.7. Outline of the thesis.. 23

Chapter 2. The building blocks. 24

2.1. Introduction.. 24

2.2.XML and namespaces.. 24

2.2.1. Definition.. 24

2.2.2. Features.. 24

2.3.URI’s and URL’s.. 26

2.3.1. Definitions.. 26

2.3.2. Features.. 26

2.4.Resource Description Framework RDF... 27

2.5.Notation 3.. 28

2.5.1. Introduction.. 28

2.5.2. Basic properties.. 28

2.6.The logic layer.. 30

2.7.Semantics of N3.. 31

2.7.1. Introduction.. 31

2.7.2. The model theory of RDF... 31

2.7.3. Examples.. 31

2.8. A global view of the Semantic Web.. 32

2.8.1. Introduction.. 32

2.8.2. The layers of the Semantic Web.. 32

Chapter 3. RDF, RDFEngine and inferencing. 35

3.1. Introduction.. 35

3.2.Recapitulation and definitions.. 35

3.3. The graph ADT... 38

3.4.Languages needed for inferencing.. 45

3.5.The model interpretation of a rule.. 45

3.6.Unifying two triplesets.. 48

3.6.1. Example.. 48

3.7.Unification.. 49

3.8.Matching two statements.. 50

3.9.The resolution process.. 52

3.10.Structure of the engine.. 53

3.11.The closure path.. 60

3.11.1.Problem.... 60

3.11.2. The closure process.. 60

3.11.3. Notation.. 60

3.11.4. Examples.. 61

3.12.The resolution path.. 63

3.12.1. The process.. 63

3.12.2. Notation.. 64

3.12.3. Example.. 64

3.13.Variable renaming.. 65

3.14.Comparison of resolution and closure paths.. 66

3.14.1. Introduction.. 66

3.14.2. Elaboration.. 67

3.15.Anti-looping technique.. 69

3.15.1. Introduction.. 69

3.15.2. Elaboration.. 70

3.15.3.Failure.. 71

3.15.4.Completeness.. 72

3.15.5.Monotonicity.. 72

3.16. Distributed inferencing.. 73

3.16.1. Definitions.. 73

3.16.2. Mechanisms.. 73

3.16.3. Conclusion.. 75

3.17.A formal theory of graph resolution.. 77

3.17.1. Introduction.. 77

3.17.2.Definitions.. 77

3.17.3. Lemmas.. 79

3.18.An extension of the theory to variable arities.. 84

3.18.1. Introduction.. 84

3.18.2.Definitions.. 84

3.18.3. Elaboration.. 85

3.19.An extension of the theory to typed nodes and arcs.. 85

3.19.1. Introduction.. 85

3.19.2. Definitions.. 85

3.19.3. Elaboration.. 86

Chapter 4. Existing software systems. 87

4.1. Introduction.. 87

4.2. Inference engines.. 87

4.2.1.Euler. 87

4.2.2. CWM.... 87

4.2.3. TRIPLE... 88

4.3. RuleML... 88

4.3.1. Introduction.. 88

4.3.2. Technical approach.. 88

4.4. The Inference Web.. 88

4.4.1. Introduction.. 88

4.4.2. Details.. 89

4.4.3. Conclusion.. 89

4.5. Query engines.. 89

4.5.1. Introduction.. 89

4.5.2. DQL... 90

4.5.3. RQL... 90

4.5.4. XQuery.. 90

4.6. Swish.. 91

Chapter 5. RDF, inferencing and logic. 92

5.1.The model mismatch.. 92

5.2.Modelling RDF in FOL... 93

5.2.1. Introduction.. 93

5.2.2. The RDF data model 93

5.2.3. A problem with reification.. 95

5.2.4.Conclusion.. 96

5.3.Unification and the RDF graph model. 96

5.4.Completeness and soundness of the engine.. 98

5.5.RDFProlog.. 98

5.6.From Horn clauses to RDF... 99

5.6.1. Introduction.. 99

5.6.2. Elaboration.. 99

5.7. Comparison with Prolog and Otter.. 100

5.7.1. Comparison of Prolog with RDFProlog.. 100

5.7.3. Otter. 102

5.8.Differences between RDF and FOL... 102

5.8.1. Anonymous entities.. 102

5.9.2.‘not’ and ‘or’. 103

5.9.3.Proposition.. 105

5.9.Logical implication.. 106

5.9.1.Introduction.. 106

5.9.2.RDF and implication.. 106

5.10.3.Conclusion.. 107

5.10. Paraconsistent logic.. 107

5.11. RDF and constructive logic.. 107

5.12. The Semantic Web and logic.. 108

5.13.OWL-Lite and logic.. 109

5.13.1.Introduction.. 109

5.13.2.Elaboration.. 110

5.15.The World Wide Web and neural networks.. 111

5.16.RDF and modal logic.. 111

5.16.1.Introduction.. 111

5.16.2.Elaboration.. 111

5.17. Logic and context. 113

5.18. Monotonicity.. 114

5.19. Learning.. 116

5.20. Conclusion.. 116

Chapter 6. Optimization. 117

6.1.Introduction.. 117

6.2. Discussion.. 117

6.3. An optimization technique.. 119

6.3.1. Example.. 119

6.3.2. Discussion.. 120

Chapter 7. Inconsistencies. 122

7.1. Introduction.. 122

7.2. Elaboration.. 122

7.3. OWL Lite and inconsistencies.. 123

7.3.1. Introduction.. 123

7.3.2. Demonstration.. 123

7.3.2. Conclusion.. 124

7.4. Using different sources.. 124

7.4.1.Introduction.. 124

7.4.2. Elaboration.. 124

7.5. Reactions to inconsistencies and trust. 126

7.5.1.Reactions to inconsistency.. 126

7.5.2.Reactions to a lack of trust. 127

7.6.Conclusion.. 127

Chapter 8. Applications. 128

8.1. Introduction.. 128

8.2. Gedcom.... 128

8.3. The subClassOf testcase.. 128

8.4. Searching paths in a graph.. 129

8.5. An Alpine club.. 129

8.6. Simulation of logic gates.. 132

8.7. A description of owls.. 133

8.8. A simulation of a flight reservation.. 134

Chapter 9. Conclusion.. 136

9.1.Introduction.. 136

9.2.Characteristics of an inference engine.. 136

9.3. An evaluation of the RDFEngine.. 138

9.4. Forward versus backwards reasoning.. 139

9.5. Final conclusions.. 139

Bibliography. 144

List of abbreviations. 148

Appendix 1: Excuting the engine. 149

Appendix 2. Example of a closure path. 150

Appendix 3. Test cases. 154

Gedcom.... 154

Ontology.. 159

Paths in a graph.. 160

Logic gates.. 162

Simulation of a flight reservation.. 164

Appendix 4. Theorem provers an overview... 167

1.Introduction.. 167

2. Overview of principles.. 167

2.1. General remarks.. 167

2.2 Reasoning using resolution techniques: see the chapter on resolution.. 168

2.3. Sequent deduction.. 168

2.4. Natural deduction.. 169

2.5. The matrix connection method.. 169

2.6 Term rewriting.. 170

2.7. Mathematical induction.. 171

2.8. Higher order logic.. 172

2.9. Non-classical logics.. 172

2.10. Lambda calculus.. 173

2.11. Typed lambda-calculus.. 174

2.12. Proof planning.. 174

2.13. Genetic algorithms.. 175

Overview of theorem provers.. 175

Higher order interactive provers.. 176

Classical. 176

Logical frameworks.. 176

Automated provers.. 177

Prolog.. 178

Overview of different logic systems.. 179

General remarks.. 179

1) Proof and programs.. 180

2. Propositon logics.. 180

3. First order predicate logic.. 180

4. Higher order logics.. 181

5. Modal logic (temporal logic). 181

6. Intuistionistic or constructive logic.. 182

7. Paraconsistent logic.. 182

8.Linear logic.. 183

Bibliography.. 184

Appendix 5. The source code of RDFEngine. 186

Appendix 6. The handling of variables. 236

 




Summary

 

The Semantic Web is an initiative of the World Wide Web Consortium (W3C). The goal of this project is to define a series of standards. These standards will create a framework for the automation of activities on the World Wide Web (WWW) that still need manual intervention by human beings at this moment.

A certain number of basic standards have been developed already. Among them XML is without doubt the most known. Less known is the Resource Description Framework (RDF). This is a standard for the creation of meta information. This is the information that has to be given to computers to permit them to handle tasks previously done by human beings.

Information alone however is not sufficient. In order to take a decision it is often necessary to follow a certain reasoning. Reasoning is connected with logics. The consequence is that computers have to be fed with programs and rules that can take decisions, following a certain logic, based on available information.

These programs and reasonings are executed by inference engines. The definition of inference engines for the Semantic Web is at the moment an active field of experimental research.

These engines have to use information that is expressed following the standard RDF. The consequences of this requirement for the definition of the engine are explored in this thesis.

An inference engine uses information and, given a query, reasons with this information with a solution as a result. It is important to be able to show that the solutions, obtained in this manner, have certain characteristics. These are, between others, the characteristics completeness and validity. This proof is delivered in this thesis in two ways:

1)     by means of a reasoning based on the graph theoretical properties of RDF

2)     by means of a model based on First Order Logics.

On the other hand these engines must take into account the specificity of the World Wide Web. One of the properties of the web is the fact that sets are not closed. This implies that not all elements of a set are known. Reasoning then has to happen in an open world. This fact has far reaching logical implications.

During the twentieth century there has been an impetous development of logic systems. More than 2000 kinds of logic were developed. Notably First Order Logic (FOL) has been under heavy attack, beside others, by the dutch Brouwer with the constructive or intuistionistic logic. Research is also done into kinds of logic that are suitable for usage by machines.

In this thesis I argue that an inference engine for the World Wide Web should follow a kind of constructive logic and that RDF, with a logical implication added, satisfies this requirement. It is shown that a constructive approach is important for the verifiability of the results of inferencing.

 

The structure of the World Wide Web has a number of consequences for the properties that an inference engine should satisfy. A list of these properties is established and discussed.

A query on the World Wide Web can extend over more than one server. This means that a process of distributed inferencing must be defined. This concept is introduced in an experimental inference engine, RDFEngine.

An important characteristic for an engine that has to be active in the World Wide Web is efficiency. Therefore it is important to do research about the methods that can be used to make an efficient engine. The structure has to be such that it is possible to work with huge volumes of data. Eventually these data are kept in relational databases.

On the World Wide Web there is information available in a lot of places and in a lot of different forms. Joining parts of this information and reasoning about the result can easily lead to the existence of contradictions and inconsistencies. These are inherent to the used logic or are dependable on the application. A special place is taken by inconsistencies that are inherent to the used ontology. For the Semantic Web the ontology is determined by the standards rdfs and OWL.

An ontology introduces a classification of data and apllies restrictions to those data. An inference engine for the Semantic Web needs to have such characteristics that it is compatible with rdfs and OWL.

An executable specification of an inference engine in Haskell was constructed. This permitted to test different aspects of inferencing and the logic connected to it. This engine has the name RDFEngine. A large number of existing testcases was executed with the engine. A certain number of testcases was constructed, some of them for inspecting the logic characteristics of RDF.

 


Samenvatting

 

Het Semantisch Web is een initiatief van het World Wide Web Consortium (W3C). Dit project beoogt het uitbouwen van een reeks van standaarden. Deze standaarden zullen een framework scheppen voor de automatisering van activiteiten op het World Wide Web (WWW) die thans nog manuele tussenkomst vereisen.

Een bepaald aantal basisstandaarden zijn reeds ontwikkeld waaronder XML ongetwijfeld de meest gekende is. Iets minder bekend is het Resource Description Framework (RDF). Dit is een standaard voor het creëeren van meta-informatie. Dit is de informatie die aan de machines moet verschaft worden om hen toe te laten de taken van de mens over te nemen.

Informatie alleen is echter niet voldoende. Om een beslissing te kunnen nemen moet dikwijls een bepaalde redenering gevolgd worden. Redeneren heeft te maken met logica. Het gevolg is dat computers gevoed moeten worden met programmas en regels die, volgens een bepaalde logica, besluiten kunnen nemen aan de hand van beschikbare informatie.

Deze programmas en redeneringen worden uitgevoerd door zogenaamde inference engines. Dit is op het huidig ogenblik een actief terrein van experimenteel onderzoek.

De engines moeten gebruik maken van de informatie die volgens de standaard RDF wordt weergegeven. De gevolgen hiervan voor de structuur van de engine worden in deze thesis nader onderzocht.

Een inference engine gebruikt informatie en, aan de hand van een vraagstelling, worden redeneringen doorgevoerd op deze informatie met een oplossing als resultaat. Het is belangrijk te kunnen aantonen dat de oplossingen die aldus bekomen worden bepaalde eigenschappen bezitten. Dit zijn onder meer de eigenschappen volledigheid en juistheid. Dit bewijs wordt op twee manieren geleverd: enerzijds bij middel van een redenering gebaseerd op de graaf theoretische eigenschappen van RDF; anders bij middel van een op First Order Logica gebaseerd model.

Anderzijds dienen zij rekening te houden met de specificiteit van het World Wide Web. Een van de eigenschappen van het web is het open zijn van verzamelingen. Dit houdt in dat van een verzameling niet alle elementen gekend zijn. Het redeneren dient dan te gebeuren in een open wereld. Dit heeft verstrekkende logische implicaties.

In de loop van de twintigste eeuw heeft de logica een stormachtige ontwikkeling gekend. Meer dan 2000 soorten logica werden ontwikkeld. Aanvallen werden doorgevoerd op het bolwerk van de First Order Logica (FOL), onder meer door de nederlander Brouwer met de constructieve of  intuistionistische  logica. Gezocht wordt ook naar soorten logica die geschikt zijn voor het gebruik door machines.

In deze thesis wordt betoogd dat een inference engine voor het WWW een constructieve logica dient te volgen enerzijds, en anderzijds, dat RDF aangevuld met een logische implicatie aan dit soort logica voldoet. Het wordt aangetoond that een constructieve benadering belangrijk is voor de verifieerbaarheid van de resultaten van de inferencing.

De structuur van het World Wide Web heeft een aantal gevolgen voor de eigenschappen waaraan een inference engine dient te voldoen. Een lijst van deze eigenschappen wordt opgesteld en besproken.

Een query op het World Wide Web kan zich uitstrekken over meerdere servers. Dit houdt in dat een proces van gedistribueerde inferencing moet gedefinieerd worden. Dit concept wordt geïntroduceerd in een experimentele inference engine, RDFEngine.

 

Een belangrijke eigenschap voor een engine die actief dient te zijn in het World Wide Web is efficientie. Het is daarom belangrijk te onderzoeken op welke wijze een efficiente engine kan gemaakt worden. De structuur dient zodanig te zijn dat het mogelijk is om met grote volumes aan data te werken, die eventueel in relationele databases opgeslagen zijn.

Op het World Wide Web is informatie op vele plaatsen en in allerlei vormen aanwezig. Het samen voegen van deze informatie en het houden van redeneringen die erop betrekking hebben kan gemakkelijk leiden tot het ontstaan van contradicties en inconsistenties. Deze zijn inherent aan de gebruikte logica of zijn afhankelijk van de applicatie. Een speciale plaats nemen de inconsistenties in die inherent zijn aan de gebruikt ontologie. Voor het Semantisch Web wordt de ontologie bepaald door de standaarden rdfs en OWL. Een ontologie voert een classifiering in van gegevens en legt ook beperkingen aan deze gegevens op. Een inference engine voor het Semantisch Web dient zulkdanige eigenschappen te bezitten dat hij compatibel is met rdfs en OWL.

Een uitvoerbare specificatie van een inference engine in Haskell werd gemaakt. Dit liet toe om allerlei aspecten van inferencing en de ermee verbonden logica te testen. De engine werd RDFEngine gedoopt. Een groot aantal bestaande testcases werd onderzocht. Een aantal testcases werd bijgemaakt, sommige speciaal met het oogmerk om de logische eigenschappen van RDF te onderzoeken.

        


 

Chapter 1. Introduction

 

1.1.  Overview

 

In chapter 1 a case study is given and then the goals of the research, that is the subject of the thesis, are exposed with the case study as illustration. The methods used are explained and an overview is given of the fundamental knowledge upon which the research is based. The relation with current research is indicated. Then the structure of the thesis is outlined. 

 

1.2. Case study

 

1.2.1. Introduction

 

A case study can serve as a guidance to what we want to achieve with the Semantic Web. Whenever standards are approved they should be such that important case studies remain possible  to implement. Discussing all possible application fields here would be out of scope. However one case study will help to clarify the goals of the research.

1.2.2. The case study

 

A travel agent in Antwerp has a client who wants to go to St.Tropez in France. There are rather a lot of possibilities for composing such a voyage. The client can take the train to France, or he can take a bus or train to Brussels and then the airplane to Nice in France, or the train to Paris then the airplane or another train to Nice. The travel agent explains the client that there are a lot of possibilities. During his explanation he gets an impression of what the client really wants. Fig.1.1. gives a schematic view of the case study.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Fig.1.1. A Semantic Web case study

 

He agrees with the client about the itinerary: by train from Antwerp to Brussels, by airplane from Brussels to Nice and by train from Nice to St. Tropez. This still leaves room for some alternatives. The client will come back to make a final decision once the travel agent has said him by mail that he has worked out some alternative solutions like price for first class vs second class etc...

Remark that the decision for the itinerary that has been taken is not very well founded; only very crude price comparisons have been done based on some internet sites that the travel agent consulted during his conversation with the client. A very cheap flight from Antwerp to Cannes has escaped the attention of the travel agent.

The travel agent will now further consult the internet sites of the Belgium railways, the Brussels airport and the France railways to get some alternative prices, departure times and total travel times.

 

Now let’s compare this with the hypothetical situation that a full blown Semantic Web would exist. In the computer of the travel agent resides a Semantic Web agent that has at its disposal all the necessary standard tools. The travel agent has a specialised interface to the general Semantic Web agent. He fills in a query in his specialised screen. This query is translated to a standardised query format for the Semantic Web agent. The agent consult his rule database. This database of course contains a lot of rules about travelling as well as facts like e.g. facts about internet sites where information can be obtained. There are a lot of ‘path’ rules: rules for composing an itinerary (for an example of what such rules could look like see: [GRAPH]. The agent contacts different other agents like the agent of the Belgium railways, the agents of the French railways, the agent of the airports of Antwerp, Brussels, Paris, Cannes, Nice etc...

With the information recieved its inference rules about scheduling a trip are consulted. This is all done while the travel agent is chatting with the client to detect his preferences. After some 5 minutes the Semantic Web agent gives the travel agent a list of alternatives for the trip; now the travel agent can immediately discuss this with his client. When a decision has been reached, the travel agent immediately gives his Semantic Web agent the order for making the reservations and ordering the tickets. Now the client only will have to come back once for getting his tickets and not twice. The travel agent not only has been able to propose a cheaper trip as in the case above but has also saved an important amount of his time.

 

1.2.3.Conclusions of the case study

 

That a realisation of such a system is interesting is evident. Clearly, the standard tools do have to be very flexible and powerful to be able to put into rules the reasoning of this case study (path determination, itinerary scheduling). All this rules have then to be made by someone. This can of course be a common effort for a lot of travel agencies.

What exists now? A quick survey learns that there are web portals where a client can make reservations (for hotel rooms). However the portal has to be fed with data by the travel agent. There also exist software that permits the client to manage his travel needs. But all that software has to be fed with information obtained by a variety of means, practically always manually.

A partial simulation namely the command of a ticket for a flight can be found in chapter 10 paragraph 8.

 

1.3.Research goals

1.3.1. Standards

 

In the case study different information sites have to communicate one with another. If this has to be done in an automatic way a standard has to be used. The standard that is used for expressing information is RDF. RDF is a standard for expressing meta-information developed by the W3C. The primary purpose was adding meta-information to web pages so that these become more usable for computers [TBL]. Later the Semantic Web initiative was launched by Berners-Lee with the purpose of developing standards for automating the exchange of information between computers. The information standard is again RDF [DESIGN]. However other standards are added on top of RDF for the realisation of the Semantic Web (chapter 2).

1.3.2. Research tasks

 

The following research tasks were defined:

 

1)   define an inference process on top of RDF and give a specification of  this process in a functional language.

 

In the case study the servers dispose of information files that contain facts and rules. The facts are expressed in RDF. The syntax of the rules is RDF also, but the semantic interpretation is different. Using rules implies inferencing. On top of RDF a standardized inference layer has to be defined. A functional language is very well suited for a declarative specification. At certain points the inference process has to be interrupted to direct queries to other sites on the World Wide Web. This implies a process of distributed inferencing where the inference process does not take place in one site (one computer) but is distributed over several sites (see also fig. 1.1).

 

2)     Investigate which kind of logic is best suited for the Semantic Web.

 

          Using rules; making queries; finding solutions; this is the subject of logic. So the relevance of logic for an inferencing standard based on top of RDF has to be investigated.

 

3)     Give a proof of the validity and the completeness of the specified inference process

 

It is without question of course that the answers to a query have to be valid. By completeness is meant that all answers to a query are found.  It is not always necessary that all answers are found but often it is.

 

4)     Investigate what can be done for augmenting the efficiency of the inference process

 

A basic internet engine should be fast because the amount of data can be high; it is possible that data need to be collected from several sites over the internet. This will already imply an accumulation of transmission times. A basic engine should be optimized as much as possible.

 

5)     Investigate how inconsistencies can be handled by the inference process

 

In a closed world (reasoning on one computer system) inconsistencies can be mastered; on the internet, when inferencing becomes a distributed process, inconsistencies will be commonplace.

 

As of yet no standard is defined for inferencing on top of RDF. As will be shown in this thesis there are a lot of issues involved and a lot of choices to be made. The definition of such a standard is a necessary, but difficult step in the progress of the Semantic Web. Therefore, in the research the accent was put on points 1-3 where points 4 and 5 were treated less in depth.

In order to be able to define a standard a definition has to be provided for following notions that represent minimal extensions on top of RDF: rules, queries, solutions and proofs.

 

1.4. Research method

 

The primary research method consists of writing a specification of an RDF inference engine in Haskell. The engine is given the name RDFEngine. The executable specification is tested using test cases where the collection of test cases of De Roo [DEROO] plays an important role. Elements of logic, efficiency and inconsistence can be tested by writing special test cases in interaction with a study of the relevant literature.

 

Fig.1.2. The inference engine RDFEngine in a server with its inputs and outputs.

 

In fig.1.2 the global structure of the inference engine is drawn. The inputs and outputs are in Notation 3 (N3). Notation 3 is choosen because it is easier to work with than RDF and it is convertible to RDF syntax. However, during the research, another syntax was developed, called RDFProlog.

The engine is given a modular structure and different versions exist for different testing purposes.

What has to be standardized is the input and output of the block marked ‘Basic engine’ in fig.1.2. For the Semantic Web the precise contents of this block will not make a part of the standard.

For making the Haskell specification the choice was made for a resolution based engine. A forward reasoning engine could have been chosen as well but a resolution based engine has better characteristics for distributed inferencing. Other choices of engine are also possible. The choice of a particular engine is not the subject of this thesis but rather the definition of the interface to the engine. However, interesting results that were obtained about resolution based inferencing are certainly worth mentioning.

 

1.5. The foundations

 

The research used mainly following material:

a)     The collection of test cases of De Roo [DEROO]

b)    The RDF model theory [RDFM] [RDFMS]

c)     The RDF rules mail list [RDFRULES]

d)    The classic resolution theory  [VANBENTHEM] [STANFORD]

e)     Logic theories and mainly first order logic, constructive logic and paraconsistent logic. [VANBENTHEM, STANFORDP]

 

1.6. Relation with the current research

 

The mail list RDF rules [RDFRULES] has as a purpose to discuss the extension of RDF with inferencing features, mainly rules, queries and answers. There are a lot of interesting discussions but no clear picture is emerging at the moment.

Berners-Lee [DESIGN] seeks inspiration in first order logic. He defines logic predicates (chapter 2), builtins, test cases and a reference engine CWM. However the semantics of input and output formats for a RDF engine are not clearly defined.

A common approach is to use existing logic systems. RDF is then expressed in terms of these systems and logic (mostly rules) is added on top e.g. Horn logic or frame logic. An example is found in [DECKER]. In this case the semantics are those of the used logic.

The implementation of an ontology like OWL (Ontology Web Language) implies also inferencing facilities that are necessary for implementing the semantics of the ontology [OWL features].

Euler [DEROO] and CWM take a graph oriented approach for the construction of the inference program. The semantics are mostly defined by means of test cases.

The approach taken by my research for defining the semantics of the inference input and output has two aspects:

1)     It is graph oriented and, in fact, the theory is generally applicable to labeled graphs.

2)     From the logic viewpoint a constructive approach is taken for reasons of verifiability (chapter 5).

3)     A constructive graph oriented definition is given for rules, queries, solutions and proofs.

 

1.7. Outline of the thesis

 

Chapter two gives an introduction into the basic standards that are used by RDF and into the syntax and model theory of RDF. Readers who are familiar with some of these matters can skip these.

 

Chapter three exposes the meaning of inferencing using an RDF facts and rules database. The Haskell specification, RDFEngine, is explained and a theory of resolution based on reasoning on a graph is presented.

 

Chapter four discusses existing software systems. After the introduction to RDF and inferencing in chapter three, it is now possible to highlight the most important aspects of existing softwares. The Haskell specification RDFEngine and the theory presented in the thesis were not based upon these softwares but upon the points mentionned in 1.5.

 

Chapter five explores in depth the relationship of RDF inferencing with logic.

Here the accent is put on a constructive logic interpretation of the graph theory exposed in chapter three.

 

Chapter six discusses optimalization techniques. Especially important is a technique based on the theory exposed in chapter three.

 

Chapter seven disusses inconsistencies that can arise during the course of inferencing processes on the Semantic Web.

 

Chapter eight gives a brief overview of some use cases and applications.

 

Chapter nine finally gives a conclusion and indicates possible ways for further research.

 

 


 

Chapter 2. The building blocks

 

2.1. Introduction

 

In this chapter a number of techniques are briefly explained. These techniques are necessary building blocks for the Semantic Web. They are also a necessary basis for inferencing on the web.

Readers who are familiar with some topics can skip the relevant explanations.

 

2.2.XML and namespaces

2.2.1. Definition

 

XML (Extensible Markup Language)  is a subset of SGML (Standard General Markup Language). In its original signification a markup language is a language which is intended for adding information (“markup” information) to an existing document.  This information must stay separate from the original hence the presence of separation characters. In SGML and XML ‘tags’ are used.

 

2.2.2. Features

 

There are two kinds of tags: opening and closing tags. The opening tags are keywords enclosed between the signs “<” and “>”. An example: <author>.  A closing tag is practically the same, only the sign “/” is added e.g. </author>. With these elements alone very interesting datastructures can be built. An example of a book description:

 

<book>

   <title>

         The Semantic Web

   </title>

   <author>

          Tim Berners-Lee

   </author>

</book>

 

As can be seen it is quite easy to build hierarchical datastructures with these elements alone. A tag can have content too: in th