Explicit knowledge representation and reasoning with Knowledge Graphs: implementation deep dive

Estimated read time 43 min read

Explicit knowledge representation and reasoning with Knowledge Graphs: implementation deep dive

 

In this blog post we will explore how SAP HANA Cloud is further increasing its multi-model capabilities by supporting also RDF-based knowledge graphs and SPARQL querying and we will see a real-world business application of this. We will see how to organize and represent enterprise data in a graph structure with HANA Cloud Knowledge Graph Engine, we will discover how this structure facilitates knowledge representation and reasoning and we will learn how to leverage Knowledge Graphs to retrieve structured data and achieve greater accuracy and explainability in RAG scenarios.

This is a summary of the business use case I presented together with my colleague @merza  in this webinar: Explicit knowledge representation and reasoning with Knowledge Graphs. Please, note the mentioned session is part of a series of webinars under the topic of “Talk to Your Business with Generative AI”. Check the full calendar here to watch the recordings of past sessions and register for the upcoming ones!

The session summary is structured in two parts: in this blog post I will dive into the implementation details of the business use case introduced in the first blog post.

This is not a product, it’s just a prototype that we use to illustrate how to implement the services and concepts we talk about and we hope it might not only inspire you but also be of great help as you have access to the working code through the SAP-Samples repository here.

To understand the content of this blog post, it would be useful if you are already familiar with some SAP products, especially SAP BTP Platform and SAP HANA Cloud, and if you already have some knowledge of generative AI, data science in general and also a basic knowledge of JavaScript and Python programming languages.

 

Implementation Deep Dive

In the next sections I will describe:

how we developed our custom knowledge graph starting from our relational data model and how we load it into the Knowledge Graph Engine. This is a prerequisite for everything here in this POC in particular, of course for the Knowledge Graph Discovery module that we implemented to explore the RDF knowledge graph.Then we will see how to implement a SPARQL endpoint. This is behind the SPARQL Explorer functionality and allows to submit SPARQL queries against the Knowledge Graph Engine.Then we will see how we enhanced the Advisory Buddy in order to be able to query in natural language the custom knowledge graph we created and to leverage the Knowledge Graph and Vector Engine together.Finally, we will see how it’s possible to infer new knowledge with the Knowledge Graph Engine.

Please, refer to previous blog post for the use case description and the relative architecture.

 

From relational DB to KG

Let’s start from the custom knowledge graph development. So we need to move from our representation with the traditional data structures to a graph representation obtained by leveraging the RDF standard.

There are many ways to build a knowledge graph, but if the starting point is, as in our case, a relational data model in a relational database, this is already of great advantage because the information is already organised in structures and relationships and the relationships and semantics are implicitly available.

So let’s see the different steps we followed:

First of all, we collected as much information, as much information as possible about the relational data model.Then we designed and developed an ontology from the relational data model. As you will see, this is crucial if you want perform text2SPARQL (this is a prerequisite to implement a GraphRAG pipeline) or if you want to infer new knowledge.Another important step is the mapping of relational database content to RDF triples. The final step of course is loading the triples into the triple store, into the HANA Cloud Knowledge Graph Engine.

Let me clarify  that in general building an RDF knowledge graph is a tough task and it’s it’s an open question. There is no one-size-fits-all approach. And in fact, in this section we give only some advises and inputs but then you have to analyse your situation carefully and then identify the best way to proceed.

1. Project documentation

Project documentation is very important and if it is not available, then a preliminary analysis is is required, because we need to list and select all the existing tables and columns. We need to know about the different primary keys and foreign keys, and we need to check the different data types to understand if a conversion is possible and how. And we need also to know about the semantics, the meaning associated to each table and column. Finally we need also to identify those concepts that cannot be expressed in SQL, but that we want to describe in an RDF Knowledge Graph.

2. Some basic rules for a RDB->RDF direct mapping:

each table generates a class;each column generates a property;foreign keys are used to construct further properties and relate the different resources;primary keys can be used to build the URI’s needed to uniquely identify the different resources in the knowledge graph.

This is just, I mean a bunch of rules that we used in the following, but if you are interested to know more about the rules of the direct mapping, check the W3C recommendation.

3. Ontology development

To understand how to use the previous rules to develop an ontology for our custom knowledge graph, let’s analyze just a small piece of the ontology we developed.

In our RDB we have a main fact table “service request” and another important dimension describing the partners. In our ontology, these two tables became two classes with the same name and the same meaning.

To model the concept that a service request that can be requested by a partner, we introduced also a specific property named “requestedBy”.

To develop our ontology, we leveraged a specific language that offers already a lot of expressivity power. So everything that I have described actually is expressed formally with some lines code (see the example below).

The first thing that we do here is to define the “SAP service request” class and provide a description for that. We do the same for the “SAP Partner” class. Then we provide a formal expression for the property “requestedBy” that is linking the two classes.

Similar lines of code have to be written to define every class, property, rule, axiom that you need in the ontology. In general in this phase a domain expert is needed, it is difficult to automate the process. So you can do it manually if you know OWL or otherwise you can profit of an open source  and visual tool named Protégé.

4. Map DB content to RDF triples

Now let’s see how we can map all the content of the different tables we have in the relational database. To understand it, we can consider again the example of the “service request” table. Let’s try to convert just the first row of this simplified this table into triples by following the rules that I mentioned before.

Refer to the animation above. The column named “ID” here in this table is the the primary key. This can be used to build a unique identifier for the resource. This will be the subject of all the triples we are going to build.

The “request state” column becomes a property. The content of the field becomes the object of the corresponding triple. The “Partner ID” is a foreign key and it can be used to generate a new property, the “requestedBy” property that we have defined conceptually in ontology.

Also in this case the content of the field, the ID identifying the partner, is used here to build another URI that points to an object that is instance of another class, the “SAP Partner” class.

This is just an example, but actually this procedure needs be repeated for all the columns that are in our “service request” table and also for all the tables in the relational data model. This is the base of a direct mapping and in the end it’s an ETL process that you can try to automatize with an ETL pipeline.

There is one simple and open source tool named OpenRefine where you can apply this procedure to process your relational data and convert it to RDF triples. You can try it for developing poc’s or for small projects, but it doesn’t allow any automation. Another possibility to implement a direct mapping is using a specific language for converting our relational DP to RDF named R2RML.

5. Loading the triples to HANA Cloud Knowledge Graph

All the processes we described so far generates triples that need to be loaded into the HANA Cloud triple store. The triples can come as a unique file or distributed over many files as in our case. The Knowledge Graph Engine provides a Python interface that we can use to access the HANA Cloud DB and load the files. Check this Jupyter notebook to see the code.

 

SPARQL Explorer

Now let’s move to the SPARQL Explorer and the SPARQL endpoint implementation. This is one of the endpoints that we developed for our POC and it’s deployed as all the other ones in Cloud Foundry. The code is very simple here, you can see it in the image below and also directly in our GitHub repository. This is a Python micro service and it’s a wrapper around the SPARQL_EXECUTE procedure that is available in Knowledge Graph Engine.

 

If we look at the code, we see that we get the SPARQL query from the body of the request sent to this endpoint. Then we execute the SPARQL query by calling the procedure SPARQL_EXECUTE and we return back the data in CSV file format or JSON format. We can choose the output format, both are supported by Knowledge Graph Engine.

 

Enhanced Advisory Buddy: querying in natural language


Let’s see how we implemented the functionality for querying the custom Knowledge Graph in natural language. In the image below you can see the process flow we need to implement to achieve this querying in natural language. So basically, the query is submitted by Mary to our application, our Smart Advisory Companion, and then this question is converted in the backend to a SPARQL query. This is done with the help of a large language model that we access through Generative AI Hub.

Once we have this SPARQL query, this can be executed by means of the SPARQL execute procedure in our Knowledge Graph Engine. The result is returned back to the user.

The key process here is this text to SPARQL conversion that is a prerequisite for implementing a complete GraphRAG pipeline. Here we are missing just the last step where the large language model is used also to generate the final answer in natural language, but we didn’t need it for our POC.

In this scenario the large language model is used to perform this text to SPARQL conversion. The large language model probably knows better than us all the rules and syntax of SPARQL, but it doesn’t know anything about the custom knowledge graph that we have in the Knowledge Graph Engine.

So in order to perform an accurate text2SPARQL conversion we need to guide the large language model and we need also to provide the KG conceptual model, that is the ontology we have built on top of our KG. In our implementation all the needed information is taken from a specific SQL table stored in HANA Cloud SQL DB at runtime. From this table we take a prompt template and also an example of a good SPARQL query to guide the model in the SPARQL query generation.

Let’s understand the most important point, how we provide information about the ontology as additional context to the user prompt to achieve the text2SPARQL conversion. In the SQL table we have stored some SPARQL queries designed to be executed on the KG stored in the Knowledge Graph Engine to retrieve the ontology description. Below you can see one of the SPARQL queries:

The one shown is used to retrieve information about all the classes that we have in our ontology along with their descriptions. These descriptions are crucial for the for guiding the large language model. The other query is similar, but it retrieves the information about all the properties.

Now let’s have a look at the Python microservice that performs text2SPARQL. Let’s refer to the two images below and let’s go through the different steps:

So first of all, we retrieve the configurations from the SQL table that I have mentioned before and that we have in HANA Cloud SQL DB.Then we execute the different SPARQL queries to retrieve the information about the ontology.We initialize the model thanks to the Generative AI Hub SDK.We prepare the prompt prompt template, replacing all the paramentes that needs to be replaced.Then we create a chain between this prompt template and the large language model.And finally we are ready to invoke the model to generate the SPARQL query that is returned back and executed against the Knowledge Graph Engine thanks to the procedure SPARQL_EXECUTE procedure.

 

 

Enhanced Advisory Buddy: KG Engine and Vector Engine


Now let’s see how we can further improve our advisory body to leverage both the Knowledge Graph Engine and the Vector Engine capabilities.

As explained in the previous demo, we need to introduce also the Vector Engine because we want to answer questions like this: “Tell me the SAP employees who delivered a service of type named “SAP BTP Technical Advisory” regarding “Multi-tenancy”.

To do this we need both structured and unstructured data. Let’s refer to the diagram below. We need to retrieve information from structured data stored in the HANA Cloud KG Engine and SQL DB, but we need also to retrieve the information stored as vectors available in the Vector Engine, and for that we need to execute some similarity searches.

Now, the microservice this time should perform two different tasks: identify the topic in the natural language query submitted by Mary and convert part of her request into a SPARQL query. So two different tasks that are performed by the large language model that is accessed with the help of Generative AI Hub.

With this information, the microservice is able to generate a hybrid query. Since we are not interested in letting the LLM generate the entire query, we impose a constraint by providing the final query model, so that the large language model only needs to generate parts of this query.

Let’s have a look at the query template (see animation below). We can recognize an outer part that is SQL-like and an inner part that is SPARQL-like.

The large language model needs to generate the internal SPARQL query and identify the topic. When the final hybrid query is executed, the internal SPARQL query is executed against the Knowledge Graph Engine using the SPARQL_TABLE function. This is because SPARQL_TABLE returns a SQL table that can be used with all other SQL tables in the HANA Cloud. And that is exactly what we do here: via a join we retrieve the use case short descriptions and their embeddings from another table in the HANA Cloud DB. Then we perform a similarity search comparing the topic identified by the LLM in the Mary’s query and the different short descriptions to find the 5 best matches.

As already mentioned, we have a new task here with respect to the previous implementation involving only the KG Engine: we need to select the delivered services based on the topic. In order to do that we need to guide our large language model also on this task by providing a specific prompt template to extract the topic from Mary’s request in natural language and also the hybrid query template that we have seen previously.

As mentioned, we have a new task compared to the previous implementation that only involved the KG Engine: we need to select the delivered services based on the topic they covered. To do that, we need to guide our large language model in this task as well, providing a specific prompt template to extract the topic from Mary’s natural language request and also the hybrid query template we saw earlier. This information is again taken from the same config SQL table that we have created in HANA Cloud DB.

Now let’s have a look at the Python microservice that performs this conversion of text to a hybrid query like the one shown before. Let’s refer to code below and let’s go through the different steps:

We retrieve the needed configurations from the config SQL table.We prepare the prompt template to identify the topic.We create the LLM chain to identify the topic thanks to Langchain and Generative AI Hub SDK.Then we invoke the large language model to identify the topic.We generate the SPARQL query in the very same way described in the previous section.Once we have these two ingredients, we prepare the final hybrid query.We execute the final query on HANA Cloud and we return the result of the execution.


 

Inference with Knowledge Graph Engine (expected in QRC2)

Now let me cover another important capability of RDF-based knowledge graphs that is probably the real differentiator with respect to the traditional data structures, the possibility to infer new knowledge. This is a festure provided by the RDF standard, and it will be supported in Q2 by HANA Cloud Knowledge Graph Engine.

Let’s try to understand what inference is in the context of RDF knowledge graphs. The key feature of an RDF knowledge graph is to make explicit all the available knowledge about a given domain so that this is immediately usable. The usability is closely related to the presence of triples that are served something in the knowledge graph.

It may happen that in the knowledge graph there is still some implicit knowledge that is not immediately immediately usable. This can can happen when the ontology describes more entities, more relationships or logical rules than the facts that are available in form of triples. In this kind of situation, we can try to make this implicit knowledge explicit and usable. To do that, we can use the inference procedure. We can therefore say that inference is the procedure for making implicit knowledge explicit and usable for business action.

But what does this mean in practice? Let’s try to understand it in the context of our POC using the custom knowledge graph we developed. So let’s start with some known data from our custom knowledge graph. For simplicity, in the following example we will not use the full URI of the objects and we will replace reale names with fictitious ones.

If we look at the animation below, what we know from the facts explicitly asserted in our knowledge graph is that:

the service request 123 was requested by Hogwarts Solutions;for this service request we have a partner contact that is Emily;the service request consists of a service named “technical advisory”;Hogwarts Solutions is an SAP partner and Emily is his contact;the technical advisory was delivered by John and John is an SAP employee.

So these are the known facts that are explicitly given in the KG. Now let’s see the questions we can answer with this information. For example, we could ask: “who was the contact person for the service request 123? “, the answer is simple because it’s stated and it’s Emily. Moreover “who delivered the the technical advisory service?” , this question is also very simple, it was John.

But if we ask: “who are the persons involved in service request 123?”,  we cannot answer this question because we are missing some concepts and we are missing also the triples, the facts to answer this question. So what can we do?

We can first of all work on the ontology introducing the missing concepts. And in fact, for this purpose, in our ontology we have introduced some hierarchies of classes and properties. We introduced the generic concepts of Organization and Person. Then we declared SAP Organization and SAP Partner subclasses of the class Organization. Similarly SAP Employee or the SAP Partner Contact are subclasses of the class Person. Additionally, we introduced the concept of role-independent involvement. We did this by introducing a property called “Involves” (and its inverse property) and by declaring the existing properties “deliveredBy” or “hasPartnerContact” as special cases of the “Involves” property.

With these additional concepts the ontology contains more information than the facts asserted in the KG. As mentioned, in such a situation, we can make use of the reasoner that is available in the Knowledge Graph engine to make explicit some the implicit knowledge contained in the ontology. So the triples of ontology and the triples corresponding to the known facts become inputs to the reasoner to infer new facts.

What are these these new facts?

John and Emily are persons;the technical advisory involved in some way John and Emily;Hogwarts Solutions is a partner, but also an organization.

We now have everything we need to make our app know that John and Emily were the people involved in the 123 request, regardless of their role and answer correctly the previous question.

What I have described is just an example of inference related to the knowledge regarding a single service request, but in reality we have inferred similar new triples for all service requests tracked in the system. So with this inference procedure, we can then answer more complex queries such as the following: “count the number of people involved in services and service requests, grouping them by service and by organization to which they belong” (refer to image below).

We can drop this query natural language in our Enhanced Advisory Buddy where it is converted into a SPARQL query. By inspecting the SPARQL query, we can see that we’re performing a specific pattern matching because here we are looking for triples requiring persons involved in a service request or a service and persons that are employed by an organization generically. This is possible because we declared the concepts explicitly in the ontology. But the answer to this question (see image below) is possible because we inferred the missing triples. In the answer we can see that every service request service involves at least one person from the partner and one person from SAP.

So this is a very important capability of RDF knowledge graphs especially when you have implemented very complex logics in the ontology and you want to deduce new unknown facts from what you already know.

Now let’s have a look at how we can run this inference in Knowledge Graph Engine. Let’s refer to the the following animation.

There is a specific command for batch inferencing where we need to provide the graphs where the ontology and the facts are stored. Then we can decide if we want to store the inferred new facts into one of the existing graphs or in new one. Finally the command is executed against the Knowledge Graph Engine. In particular, when it is executed, the Knowledge Graph engine scans the specific specified graphs for any of the RDFS or unsupported OWL ontologies and then it generates new triples according to the W3C OWL 2 RL rules or rules that you can specify in the WHERE close.

 

Wrap-up and roadmap

Let’s summarise what we have learnt in this blog post. First of all in SAP HANA Cloud we are introducing a new a new functionality: a triple store, the Knowledge Graph Engine,  to store and consume RDF-based knowledge graphs. So we are now supporting the RDF standard and the SPARQL querying.

We are providing tools to ensure the interoperability between SQL and SPARQL with specific procedures and functions. We are also going to support inference with RDF KG  and the validation with SHACL in Q2.

The poc we have developed helps us recognize some of the advantages that can be obtained with RDF KS and the Knowledge Graph Engine:

Model (even abstract) concepts from custom business domains;Create a centralised semantic view of the business data (especially when the data sources leverage already the same RDF standard);Implementation of text2SPARQL and GraphRAG pipelines to ground generative on structured data;Benefit from the synergy with other HANA Cloud multi-model capabilities and implement a multi-model RAG pipelines.

Concerning the roadmap for Knowledge Graph Engine, as mentioned, we will have in Q2 the inference and the validation capabilities. We are waiting for the Langchain integration that, for instance, will simplify the implementation of Q&A chains with RDF knowledge graphs. You can check the GitHub repo to be informed about the release. Finally to be always informed about the future HANA Cloud feature, you can check the official HANA Cloud roadmap.

As a final note, don’t miss the opportunity and register now for the next webinars:

Amplify Joule’s power for your enterprise needs

Session 1: 1st July  2025, 09:00 AM CET – 11:00 AM CET

Session 2: 2nd July 2025, 10:00 AM EST – 12:00 PM EST

 

Additional resources

•       Become an Early Adopter for the Knowledge Graph Engine in SAP HANA Cloud

•       HANA Cloud KG Engine documentation

•       Blog Post: Connecting the Facts: SAP HANA Cloud’s Knowledge Graph Engine for Business Context

•       Learning: openHPI Knowledge Graphs – Foundations and Applications

 

​ Explicit knowledge representation and reasoning with Knowledge Graphs: implementation deep dive In this blog post we will explore how SAP HANA Cloud is further increasing its multi-model capabilities by supporting also RDF-based knowledge graphs and SPARQL querying and we will see a real-world business application of this. We will see how to organize and represent enterprise data in a graph structure with HANA Cloud Knowledge Graph Engine, we will discover how this structure facilitates knowledge representation and reasoning and we will learn how to leverage Knowledge Graphs to retrieve structured data and achieve greater accuracy and explainability in RAG scenarios.This is a summary of the business use case I presented together with my colleague @merza  in this webinar: Explicit knowledge representation and reasoning with Knowledge Graphs. Please, note the mentioned session is part of a series of webinars under the topic of “Talk to Your Business with Generative AI”. Check the full calendar here to watch the recordings of past sessions and register for the upcoming ones!The session summary is structured in two parts: in this blog post I will dive into the implementation details of the business use case introduced in the first blog post.This is not a product, it’s just a prototype that we use to illustrate how to implement the services and concepts we talk about and we hope it might not only inspire you but also be of great help as you have access to the working code through the SAP-Samples repository here.To understand the content of this blog post, it would be useful if you are already familiar with some SAP products, especially SAP BTP Platform and SAP HANA Cloud, and if you already have some knowledge of generative AI, data science in general and also a basic knowledge of JavaScript and Python programming languages. Implementation Deep DiveIn the next sections I will describe:how we developed our custom knowledge graph starting from our relational data model and how we load it into the Knowledge Graph Engine. This is a prerequisite for everything here in this POC in particular, of course for the Knowledge Graph Discovery module that we implemented to explore the RDF knowledge graph.Then we will see how to implement a SPARQL endpoint. This is behind the SPARQL Explorer functionality and allows to submit SPARQL queries against the Knowledge Graph Engine.Then we will see how we enhanced the Advisory Buddy in order to be able to query in natural language the custom knowledge graph we created and to leverage the Knowledge Graph and Vector Engine together.Finally, we will see how it’s possible to infer new knowledge with the Knowledge Graph Engine.Please, refer to previous blog post for the use case description and the relative architecture. From relational DB to KGLet’s start from the custom knowledge graph development. So we need to move from our representation with the traditional data structures to a graph representation obtained by leveraging the RDF standard.There are many ways to build a knowledge graph, but if the starting point is, as in our case, a relational data model in a relational database, this is already of great advantage because the information is already organised in structures and relationships and the relationships and semantics are implicitly available.So let’s see the different steps we followed:First of all, we collected as much information, as much information as possible about the relational data model.Then we designed and developed an ontology from the relational data model. As you will see, this is crucial if you want perform text2SPARQL (this is a prerequisite to implement a GraphRAG pipeline) or if you want to infer new knowledge.Another important step is the mapping of relational database content to RDF triples. The final step of course is loading the triples into the triple store, into the HANA Cloud Knowledge Graph Engine.Let me clarify  that in general building an RDF knowledge graph is a tough task and it’s it’s an open question. There is no one-size-fits-all approach. And in fact, in this section we give only some advises and inputs but then you have to analyse your situation carefully and then identify the best way to proceed.1. Project documentationProject documentation is very important and if it is not available, then a preliminary analysis is is required, because we need to list and select all the existing tables and columns. We need to know about the different primary keys and foreign keys, and we need to check the different data types to understand if a conversion is possible and how. And we need also to know about the semantics, the meaning associated to each table and column. Finally we need also to identify those concepts that cannot be expressed in SQL, but that we want to describe in an RDF Knowledge Graph.2. Some basic rules for a RDB->RDF direct mapping:each table generates a class;each column generates a property;foreign keys are used to construct further properties and relate the different resources;primary keys can be used to build the URI’s needed to uniquely identify the different resources in the knowledge graph.This is just, I mean a bunch of rules that we used in the following, but if you are interested to know more about the rules of the direct mapping, check the W3C recommendation.3. Ontology developmentTo understand how to use the previous rules to develop an ontology for our custom knowledge graph, let’s analyze just a small piece of the ontology we developed.In our RDB we have a main fact table “service request” and another important dimension describing the partners. In our ontology, these two tables became two classes with the same name and the same meaning.To model the concept that a service request that can be requested by a partner, we introduced also a specific property named “requestedBy”.To develop our ontology, we leveraged a specific language that offers already a lot of expressivity power. So everything that I have described actually is expressed formally with some lines code (see the example below).The first thing that we do here is to define the “SAP service request” class and provide a description for that. We do the same for the “SAP Partner” class. Then we provide a formal expression for the property “requestedBy” that is linking the two classes.Similar lines of code have to be written to define every class, property, rule, axiom that you need in the ontology. In general in this phase a domain expert is needed, it is difficult to automate the process. So you can do it manually if you know OWL or otherwise you can profit of an open source  and visual tool named Protégé.4. Map DB content to RDF triplesNow let’s see how we can map all the content of the different tables we have in the relational database. To understand it, we can consider again the example of the “service request” table. Let’s try to convert just the first row of this simplified this table into triples by following the rules that I mentioned before.Refer to the animation above. The column named “ID” here in this table is the the primary key. This can be used to build a unique identifier for the resource. This will be the subject of all the triples we are going to build.The “request state” column becomes a property. The content of the field becomes the object of the corresponding triple. The “Partner ID” is a foreign key and it can be used to generate a new property, the “requestedBy” property that we have defined conceptually in ontology.Also in this case the content of the field, the ID identifying the partner, is used here to build another URI that points to an object that is instance of another class, the “SAP Partner” class.This is just an example, but actually this procedure needs be repeated for all the columns that are in our “service request” table and also for all the tables in the relational data model. This is the base of a direct mapping and in the end it’s an ETL process that you can try to automatize with an ETL pipeline.There is one simple and open source tool named OpenRefine where you can apply this procedure to process your relational data and convert it to RDF triples. You can try it for developing poc’s or for small projects, but it doesn’t allow any automation. Another possibility to implement a direct mapping is using a specific language for converting our relational DP to RDF named R2RML.5. Loading the triples to HANA Cloud Knowledge GraphAll the processes we described so far generates triples that need to be loaded into the HANA Cloud triple store. The triples can come as a unique file or distributed over many files as in our case. The Knowledge Graph Engine provides a Python interface that we can use to access the HANA Cloud DB and load the files. Check this Jupyter notebook to see the code. SPARQL ExplorerNow let’s move to the SPARQL Explorer and the SPARQL endpoint implementation. This is one of the endpoints that we developed for our POC and it’s deployed as all the other ones in Cloud Foundry. The code is very simple here, you can see it in the image below and also directly in our GitHub repository. This is a Python micro service and it’s a wrapper around the SPARQL_EXECUTE procedure that is available in Knowledge Graph Engine. If we look at the code, we see that we get the SPARQL query from the body of the request sent to this endpoint. Then we execute the SPARQL query by calling the procedure SPARQL_EXECUTE and we return back the data in CSV file format or JSON format. We can choose the output format, both are supported by Knowledge Graph Engine. Enhanced Advisory Buddy: querying in natural languageLet’s see how we implemented the functionality for querying the custom Knowledge Graph in natural language. In the image below you can see the process flow we need to implement to achieve this querying in natural language. So basically, the query is submitted by Mary to our application, our Smart Advisory Companion, and then this question is converted in the backend to a SPARQL query. This is done with the help of a large language model that we access through Generative AI Hub.Once we have this SPARQL query, this can be executed by means of the SPARQL execute procedure in our Knowledge Graph Engine. The result is returned back to the user.The key process here is this text to SPARQL conversion that is a prerequisite for implementing a complete GraphRAG pipeline. Here we are missing just the last step where the large language model is used also to generate the final answer in natural language, but we didn’t need it for our POC.In this scenario the large language model is used to perform this text to SPARQL conversion. The large language model probably knows better than us all the rules and syntax of SPARQL, but it doesn’t know anything about the custom knowledge graph that we have in the Knowledge Graph Engine.So in order to perform an accurate text2SPARQL conversion we need to guide the large language model and we need also to provide the KG conceptual model, that is the ontology we have built on top of our KG. In our implementation all the needed information is taken from a specific SQL table stored in HANA Cloud SQL DB at runtime. From this table we take a prompt template and also an example of a good SPARQL query to guide the model in the SPARQL query generation.Let’s understand the most important point, how we provide information about the ontology as additional context to the user prompt to achieve the text2SPARQL conversion. In the SQL table we have stored some SPARQL queries designed to be executed on the KG stored in the Knowledge Graph Engine to retrieve the ontology description. Below you can see one of the SPARQL queries:The one shown is used to retrieve information about all the classes that we have in our ontology along with their descriptions. These descriptions are crucial for the for guiding the large language model. The other query is similar, but it retrieves the information about all the properties.Now let’s have a look at the Python microservice that performs text2SPARQL. Let’s refer to the two images below and let’s go through the different steps:So first of all, we retrieve the configurations from the SQL table that I have mentioned before and that we have in HANA Cloud SQL DB.Then we execute the different SPARQL queries to retrieve the information about the ontology.We initialize the model thanks to the Generative AI Hub SDK.We prepare the prompt prompt template, replacing all the paramentes that needs to be replaced.Then we create a chain between this prompt template and the large language model.And finally we are ready to invoke the model to generate the SPARQL query that is returned back and executed against the Knowledge Graph Engine thanks to the procedure SPARQL_EXECUTE procedure.  Enhanced Advisory Buddy: KG Engine and Vector EngineNow let’s see how we can further improve our advisory body to leverage both the Knowledge Graph Engine and the Vector Engine capabilities.As explained in the previous demo, we need to introduce also the Vector Engine because we want to answer questions like this: “Tell me the SAP employees who delivered a service of type named “SAP BTP Technical Advisory” regarding “Multi-tenancy”.To do this we need both structured and unstructured data. Let’s refer to the diagram below. We need to retrieve information from structured data stored in the HANA Cloud KG Engine and SQL DB, but we need also to retrieve the information stored as vectors available in the Vector Engine, and for that we need to execute some similarity searches.Now, the microservice this time should perform two different tasks: identify the topic in the natural language query submitted by Mary and convert part of her request into a SPARQL query. So two different tasks that are performed by the large language model that is accessed with the help of Generative AI Hub.With this information, the microservice is able to generate a hybrid query. Since we are not interested in letting the LLM generate the entire query, we impose a constraint by providing the final query model, so that the large language model only needs to generate parts of this query.Let’s have a look at the query template (see animation below). We can recognize an outer part that is SQL-like and an inner part that is SPARQL-like.The large language model needs to generate the internal SPARQL query and identify the topic. When the final hybrid query is executed, the internal SPARQL query is executed against the Knowledge Graph Engine using the SPARQL_TABLE function. This is because SPARQL_TABLE returns a SQL table that can be used with all other SQL tables in the HANA Cloud. And that is exactly what we do here: via a join we retrieve the use case short descriptions and their embeddings from another table in the HANA Cloud DB. Then we perform a similarity search comparing the topic identified by the LLM in the Mary’s query and the different short descriptions to find the 5 best matches.As already mentioned, we have a new task here with respect to the previous implementation involving only the KG Engine: we need to select the delivered services based on the topic. In order to do that we need to guide our large language model also on this task by providing a specific prompt template to extract the topic from Mary’s request in natural language and also the hybrid query template that we have seen previously.As mentioned, we have a new task compared to the previous implementation that only involved the KG Engine: we need to select the delivered services based on the topic they covered. To do that, we need to guide our large language model in this task as well, providing a specific prompt template to extract the topic from Mary’s natural language request and also the hybrid query template we saw earlier. This information is again taken from the same config SQL table that we have created in HANA Cloud DB.Now let’s have a look at the Python microservice that performs this conversion of text to a hybrid query like the one shown before. Let’s refer to code below and let’s go through the different steps:We retrieve the needed configurations from the config SQL table.We prepare the prompt template to identify the topic.We create the LLM chain to identify the topic thanks to Langchain and Generative AI Hub SDK.Then we invoke the large language model to identify the topic.We generate the SPARQL query in the very same way described in the previous section.Once we have these two ingredients, we prepare the final hybrid query.We execute the final query on HANA Cloud and we return the result of the execution. Inference with Knowledge Graph Engine (expected in QRC2)Now let me cover another important capability of RDF-based knowledge graphs that is probably the real differentiator with respect to the traditional data structures, the possibility to infer new knowledge. This is a festure provided by the RDF standard, and it will be supported in Q2 by HANA Cloud Knowledge Graph Engine.Let’s try to understand what inference is in the context of RDF knowledge graphs. The key feature of an RDF knowledge graph is to make explicit all the available knowledge about a given domain so that this is immediately usable. The usability is closely related to the presence of triples that are served something in the knowledge graph.It may happen that in the knowledge graph there is still some implicit knowledge that is not immediately immediately usable. This can can happen when the ontology describes more entities, more relationships or logical rules than the facts that are available in form of triples. In this kind of situation, we can try to make this implicit knowledge explicit and usable. To do that, we can use the inference procedure. We can therefore say that inference is the procedure for making implicit knowledge explicit and usable for business action.But what does this mean in practice? Let’s try to understand it in the context of our POC using the custom knowledge graph we developed. So let’s start with some known data from our custom knowledge graph. For simplicity, in the following example we will not use the full URI of the objects and we will replace reale names with fictitious ones.If we look at the animation below, what we know from the facts explicitly asserted in our knowledge graph is that:the service request 123 was requested by Hogwarts Solutions;for this service request we have a partner contact that is Emily;the service request consists of a service named “technical advisory”;Hogwarts Solutions is an SAP partner and Emily is his contact;the technical advisory was delivered by John and John is an SAP employee.So these are the known facts that are explicitly given in the KG. Now let’s see the questions we can answer with this information. For example, we could ask: “who was the contact person for the service request 123? “, the answer is simple because it’s stated and it’s Emily. Moreover “who delivered the the technical advisory service?” , this question is also very simple, it was John.But if we ask: “who are the persons involved in service request 123?”,  we cannot answer this question because we are missing some concepts and we are missing also the triples, the facts to answer this question. So what can we do?We can first of all work on the ontology introducing the missing concepts. And in fact, for this purpose, in our ontology we have introduced some hierarchies of classes and properties. We introduced the generic concepts of Organization and Person. Then we declared SAP Organization and SAP Partner subclasses of the class Organization. Similarly SAP Employee or the SAP Partner Contact are subclasses of the class Person. Additionally, we introduced the concept of role-independent involvement. We did this by introducing a property called “Involves” (and its inverse property) and by declaring the existing properties “deliveredBy” or “hasPartnerContact” as special cases of the “Involves” property.With these additional concepts the ontology contains more information than the facts asserted in the KG. As mentioned, in such a situation, we can make use of the reasoner that is available in the Knowledge Graph engine to make explicit some the implicit knowledge contained in the ontology. So the triples of ontology and the triples corresponding to the known facts become inputs to the reasoner to infer new facts.What are these these new facts?John and Emily are persons;the technical advisory involved in some way John and Emily;Hogwarts Solutions is a partner, but also an organization.We now have everything we need to make our app know that John and Emily were the people involved in the 123 request, regardless of their role and answer correctly the previous question.What I have described is just an example of inference related to the knowledge regarding a single service request, but in reality we have inferred similar new triples for all service requests tracked in the system. So with this inference procedure, we can then answer more complex queries such as the following: “count the number of people involved in services and service requests, grouping them by service and by organization to which they belong” (refer to image below).We can drop this query natural language in our Enhanced Advisory Buddy where it is converted into a SPARQL query. By inspecting the SPARQL query, we can see that we’re performing a specific pattern matching because here we are looking for triples requiring persons involved in a service request or a service and persons that are employed by an organization generically. This is possible because we declared the concepts explicitly in the ontology. But the answer to this question (see image below) is possible because we inferred the missing triples. In the answer we can see that every service request service involves at least one person from the partner and one person from SAP.So this is a very important capability of RDF knowledge graphs especially when you have implemented very complex logics in the ontology and you want to deduce new unknown facts from what you already know.Now let’s have a look at how we can run this inference in Knowledge Graph Engine. Let’s refer to the the following animation.There is a specific command for batch inferencing where we need to provide the graphs where the ontology and the facts are stored. Then we can decide if we want to store the inferred new facts into one of the existing graphs or in new one. Finally the command is executed against the Knowledge Graph Engine. In particular, when it is executed, the Knowledge Graph engine scans the specific specified graphs for any of the RDFS or unsupported OWL ontologies and then it generates new triples according to the W3C OWL 2 RL rules or rules that you can specify in the WHERE close. Wrap-up and roadmapLet’s summarise what we have learnt in this blog post. First of all in SAP HANA Cloud we are introducing a new a new functionality: a triple store, the Knowledge Graph Engine,  to store and consume RDF-based knowledge graphs. So we are now supporting the RDF standard and the SPARQL querying.We are providing tools to ensure the interoperability between SQL and SPARQL with specific procedures and functions. We are also going to support inference with RDF KG  and the validation with SHACL in Q2.The poc we have developed helps us recognize some of the advantages that can be obtained with RDF KS and the Knowledge Graph Engine:Model (even abstract) concepts from custom business domains;Create a centralised semantic view of the business data (especially when the data sources leverage already the same RDF standard);Implementation of text2SPARQL and GraphRAG pipelines to ground generative on structured data;Benefit from the synergy with other HANA Cloud multi-model capabilities and implement a multi-model RAG pipelines.Concerning the roadmap for Knowledge Graph Engine, as mentioned, we will have in Q2 the inference and the validation capabilities. We are waiting for the Langchain integration that, for instance, will simplify the implementation of Q&A chains with RDF knowledge graphs. You can check the GitHub repo to be informed about the release. Finally to be always informed about the future HANA Cloud feature, you can check the official HANA Cloud roadmap.As a final note, don’t miss the opportunity and register now for the next webinars:Amplify Joule’s power for your enterprise needsSession 1: 1st July  2025, 09:00 AM CET – 11:00 AM CETSession 2: 2nd July 2025, 10:00 AM EST – 12:00 PM EST Additional resources•       Become an Early Adopter for the Knowledge Graph Engine in SAP HANA Cloud•       HANA Cloud KG Engine documentation•       Blog Post: Connecting the Facts: SAP HANA Cloud’s Knowledge Graph Engine for Business Context•       Learning: openHPI Knowledge Graphs – Foundations and Applications   Read More Technology Blog Posts by SAP articles 

#SAP

#SAPTechnologyblog

You May Also Like

More From Author