Large Language Model = Knowledge Graph Store? Yes, by Fine-Tuning LLM With KG

Learn how to get GPT to ask complex questions in any language

Peter Lawrence, answering users' data questions

Published in

Better Programming

10 min readApr 28, 2023

I know, how dare I, but I disagree with ChatGPT. I asked if a GPT can be used as a Knowledge Graph store, and this is part of its answer:

… While GPT can be used to generate text related to a particular topic, it is not designed to be a knowledge graph store. …

This article shows how a knowledge graph can fine-tune an LLM, enabling users to ask complex questions in their language. To illustrate this, we use two RDF knowledge graphs to fine-tune OpenAI’s GPT LLM. One is of a process plant, the core of a Digital-Twin. The other is a genealogical model of the Kennedys family.

Fine-Tuning an LLM With a Knowledge Graph

Although prompting works well, it is limited by the scale of any serialized knowledge graph that can be included within a prompt.

Prompts are limited to about 1,500 words. Although this limit is increasing, it has a long way to go to absorb a serialized graph.

An LLM is not going to be tuned to your domain’s information. To scale, one will need to fine-tune the LLM on the knowledge graph. An LLM can be fine-tuned with an unlimited amount of training data… but at a cost. Training data needs to be gathered, cleaned, and prepared for training. Preparing the data for fine-tuning involves supplying a series of prompts and their corresponding completions.

{"prompt": "The cat sat on?", "completion": " The cat sat on the mat"}

This process involves providing the LLM with example prompts and their corresponding completion (aka answer).

In its simplest form, a prompt:completion pair seems to correspond to the edge of an RDF graph which is represented by the triple:

{:subject :predicate :object}

Since there are potentially three unknowns, we could formulate three prompt:completion statements from each edge within the graph, as shown below:

{"prompt": "What is <:subject> <predicate>? ###",
"completion": " <:subject> is <:predicate> <:object>. @@@"}
{"prompt": "What is <predicate> <:object>? ###",
"completion": " <:subject> is <:predicate> <:object>. @@@"}
{"prompt": "How is <:subject> related to <:object>? ###",
"completion": " <:subject> is <:predicate> <:object>. @@@"}

There could be other prompt:completion statements, such as Concise Bounded Descriptions, but in this article, we only discuss this form.

… of the Flowsheet Model

The prior article used a Flowsheet model of a process plant as its example. A fragment of the flowsheet model written in Turtle is as follows:

Reboiler-601
 connectivity:connectedTo :Pipe-005 ;
 connectivity:connectedTo :Pipe-006 ;
 …

This can be used as fine-tuning prompts as follows:

{"prompt":"What is Reboiler-601 connectedTo? ###",
"completion": " Reboiler-601 is connectedTo Pipe-005. @@@"}
{"prompt":"What is connectedTo Pipe-005? ###",
"completion": " Reboiler-601 is connectedTo Pipe-005. @@@"}
{"prompt":"How is Reboiler-601 related to Pipe-005? ###",
"completion": " Reboiler-601 is connectedTo Pipe-005. @@@"}
{"prompt":"What is Reboiler-601 connectedTo? ###",
"completion": " Reboiler-601 is connectedTo Pipe-006. @@@"}
{"prompt":"What is connectedTo Pipe-006? ###",
"completion": " Reboiler-601 is connectedTo Pipe-006. @@@"}
{"prompt":"How is Reboiler-601 related to Pipe-006? ###",
"completion": " Reboiler-601 is connectedTo Pipe-006. @@@"}
…

With a fine-tuned LLM, we can now ask simple questions, right? Let’s ask a fine-tuned OpenAI Davinci model a low-ball question. Note that the SPARQL is what we would have formulated if we were asking the KG directly.

SPARQL

select * where { :Distill-501 ?p ?o}

Davinci-Flowsheet-4 GPT question

What is connected to Distill-501?

Answer

Pipe-009 is connected to Distill-501 Pipe-009 has connected SignalLine-206 to Distill-501 _ 
…. And more rubbish

We have not been convincing enough for the LLM to recognize the knowledge graph as factual statements.

…Convincing the LLM With More Epochs

When fine-tuning an LLM, it is easy to overfit the model. Overfitting describes the property of a model to adapt itself too strongly to the fine-tuning data set. This would be a problem if I were asserting the prompt:completion statement such as:

{"prompt": "Peter is?", "completion": " Peter is handsome😀"}

However, assuming that any knowledge graph with which we want to fine-tune the model would be well-curated is reasonable. Therefore, we want our prompt:completion statements to be treated as near-certainties.

With OpenAI, the n_epochs parameter controls the number of passes through the fine-tuning data. This parameter has a default of 4. Increasing the n_epochs parameter encourages the LLM to adapt more towards the fine-tuning data (and costing more!), as discussed here.

Therefore the Davinci LLM was fine-tuned using 16 passes. Again, we ask a simple question with much better results:

SPARQL

select * 
where { 
  :Distill-501 ?p ?o
}

Davinci-Flowsheet-16 GPT question

What is connected to Distill-501?

Answer

Pipe-005 is connected to Distill-501
Pipe-014 is connected to Distill-501
Pipe-002 is connected to Distill-501
… ✔

Other questions are also consistently answered correctly, as shown below:

How is SignalLine-209 related to Distill-501? ###
SignalLine-209 is connectedTo Distill-501 ✔

… of a Kennedys Model

TopQuadrant uses a Kennedys genealogical graph in many of their examples. That graph is available here, which we will use to fine-tune the Davinci LLM. Whereas the Flowsheet model only dealt with the single predicate, connectedTo, the Kenndys model has a much more interesting ontology with both datatype and objecttype values. A fragment associated with Arnold Schwarzenegger is, in Turtle format, as follows:

kennedys:ArnoldSchwarzenegger
rdf:type kennedys:Person ;
kennedys:almaMater kennedys:UniversityOfWisconsin ;
kennedys:birthYear 1947 ;
… and so on
.

The corresponding fine-tuning prompts are:

{"prompt":"What is ArnoldSchwarzenegger almaMater? ###",
"completion": " ArnoldSchwarzenegger has almaMater UniversityOfWisconsin. @@@"}
{"prompt":"What has almaMater UniversityOfWisconsin? ###",
"completion": " ArnoldSchwarzenegger has almaMater UniversityOfWisconsin. @@@"}
{"prompt":"How are ArnoldSchwarzenegger and UniversityOfWisconsin related? ###",
"completion": " ArnoldSchwarzenegger has almaMater UniversityOfWisconsin. @@@"}
{"prompt":"What is ArnoldSchwarzenegger birthYear? : " ArnoldSchwarzenegger has birthYear 1947. @@@"}
{"prompt":"What is ArnoldSchwarzenegger firstName? ###","completion": " ArnoldSchwarzenegger has firstName Arnold. @@@"}
… and so on

Note that for attributes (object is a literal rather than an IRI), we have, at present, only one prompt:completion pair. That is a subject for further experimentation.

Using these prompt:completions, the Davinci LLM was fine-tuned using 16 passes, given the lessons learned from the Flowsheet model fine-tuning. We are then able to start asking questions.

SPARQL

select *
where {
  ?s :spouse ?spouse ;
    :firstName 'Arnold'.
}

Davinci-Kennedys-16 GPT question

Who is Arnold married to?

Answer

ArnoldSchwarzenegger has spouse MariaShriver ✔

Note that ‘married to’ does not even appear in the serialized graph. However, its equivalent ‘spouse’ does. Therefore, we are already taking advantage of the pre-tuning within the LLM. Another more involved question is:

SPARQL

select *
where {
  ?s :spouse ?spouse ;
    :firstName 'Arnold'.
  ?spouse :almaMater ?almaMater .
}

Davinci-Kennedys-16 GPT question

Who is the person married to Arnold, 
and what is that person's alma mater?

Answer

MariaShriver has almaMater GeorgetownUniversity ✔

Yet another more involved question is:

SPARQL

select *
where {
  ?s :spouse ?spouse ;
    :firstName 'Arnold'.
  ?spouse :parent ?parent .
  ?parent :birthYear ?birthYear .
}

Davinci-Kennedys-16 GPT question

Who is the person married to Arnold, 
what is that person's parent, 
and what is that parent's birth year?

Answer

ArnoldSchwarzenegger has spouse MariaShriver. 
She has parent EuniceKennedy. 
EuniceKennedy was born in 1921 ✔

“By George, I think SHE’s got it,” to misappropriate Prof Henry Higgins.

Questions That Span the LLM’s and the Graph’s Knowledge

Given that the LLM has been pre-tuned with significant knowledge, what about queries that span both knowledge bases? Note that solving queries that span multiple data sources is the primary motivation of this investigation.

SPARQL

select *
where {
  ?s :spouse ?spouse ;
    :firstName 'Arnold'.
  ?spouse :parent ?parent ;
  ?parent :almaMater ?almaMater .
  ?alamaMater ????? ?location .
}

Davinci-Kennedys-16 GPT question

Who is the person married to Arnold, 
who is that person's parent, 
what is that parent's alma mater, 
and where is that alma mater?

Answer

The person married to Arnold is Maria Shriver. 
Her parent is Sargent Shriver, who attended Yale University. 
Yale University is located in New Haven, Connecticut. ✔✔

Within the graph that was used to fine-tune the LLM, there are no details regarding the location of any alma mater, hence the impossibility of the SPARQL query. Yet, the LLM was able to construct the correct answer. Interestingly, the answer also uses Robert Shriver’s middle name, by which he was known, although the graph only contains the middle initial. Again, this demonstrates the benefit of merging the LLM and the domain’s knowledge graph.

Asking a Fine-Tuned LLM … With Chain-of-Thought Prompts

Admittedly, we have only demonstrated the ability to answer fairly simple questions. A typical user’s question about the Flowsheet model might be to explore the path between two locations:

Davinci-Flowsheet-16 GPT question

How is Valve-103 connected to Reboiler-601?

Answer

Valve-103 is connectedTo Reboiler-601 ❌✔

Probably correct, but we want to know the details of the path that connects these two locations. After all, that is one of the benefits of a knowledge graph representation.

In a previous article, we demonstrated CoT prompting to answer this question successfully. However, we were using the most capable model: gpt-3.5-turbo. Is it possible to include CoT prompts to guide the fine-tuned LLM to provide more explanatory answers? The reason to ask this question is that fine-tuning is only available for less capable base models: ada, babbage, curie, and davinci.

Testing Chain-of-Thought Prompting

The classic Chain-of-Thought prompt example is repeated below:

Textual prompting

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. 
  Each can has 3 tennis balls. How many tennis balls does he have now?
A: Roger started with 5 balls. 
  2 cans of 3 tennis balls each is 6 tennis balls. 5+6=11. the answer is 11.
Q: The cafeteria had 23 apples. 
  If they used 20 to make lunch and bought 6 more, how many apples do they have?

If we apply this to the models at hand, we get the following responses:

Text-davinci-3 answer

A: The cafeteria had 23 apples. 20 were used for lunch, so they have 3 left. 
Then they bought 6 more, giving them 9 apples in total. 
The answer is 9 apples. ✔

Davinci answer

A: 23–20=3 apples. 6 apples. 3+6=9 apples. ✔

If we apply the CoT prompting as well as the serialized flow sheet context to the following models, we get these answers:

Serialized Knowledge Graph Context

Let's introduce the concept of Node.
BL-051, BL-052, Pipe-006, Reboiler-601,… 
are all individuals of type Node in the ontology.
…
Reboiler-601 is connected to Pipe-005.
Reboiler-601 is connected to Pipe-006.
Reboiler-601 is connected to Pipe-012.
…

CoT prompt

Q: How is Distill-501 connected to Reflux-401?
A: If A is connectedTo B and B is connected to C, then we can say A is connectedTo C via B.
Distill-501 is connected to Pipe-016, which is connected to Condenser-303, which is connected to Pipe-022, which then connects to Reflux-401.
Q: So how is Distill-501 connected to Valve-104?
… and so on
Q: How is Valve-101 connected to Reflux-401?
A: Valve-101 is connected to Pipe-017, which is connected to Condenser-303, which is connected to Pipe-022, which is connected to Reflux-401.

Question

Q: How is Valve-103 connected to Reboiler-601?

text-davinci-3 answer

A: Valve-103 is connected to Pipe-018, 
which is connected to Distill-501, 
which is connected to Pipe-013, 
which is connected to Reboiler-601. ✔

text-davinci-2 answer

A: Valve-103 is connected to Pipe-018, 
which is connected to Distill-501, 
which is connected to Pipe-013, 
which is connected to Reboiler-601. ✔

text-davinci-1 answer

(no answer) ❌

Davinci answer

(no answer) ❌

If we apply the same CoT context to the fine-tuned LLM, but without the serialized flowsheet context, we get the following:

Davinci-Flowsheet-16 GPT answer:

Valve-103 is connected to Pipe-013, which is connected to Reboiler-601 ❌

Disappointing but not surprising, as davinci with the flowsheet context, failed to deliver any answer. It is also documented that the text-davinci-003 model has been designed for instruction-following tasks.

So, what is the answer to this conundrum? The next experiment will be to fine-tune a model designed for instruction following tasks when and if that becomes available for fine-tuning.

What Have We Learnt?

Can an LLM be used as a knowledge graph store?

Yes, it can, and it works well.
The graph edge statements are serialized as a prompt:completion statements.

2. Overfitting suits Knowledge Graph fine-tuning.

Better results are obtained with strong (epoch=16) adaptation to the knowledge graph.
Overfitting costs more!

3. A Knowledge Graph fine-tuned LLM consistently and correctly answers queries.

Answers were available with zero-shot prompting for most queries.

4. Path-search queries were less successful.

Path search works well with instruction-trained LLM, such as text-davinci-003

5. Unfortunately, only less capable models were fine-tuned with Knowledge Graph data.

Instruction-following models will, we hope, become available for fine-tuning.

6. A serialized Knowledge Graph is an excellent source of fine-tuning data

Preparing training data is one of the most expensive phases in training an LLM.
A previously created and curated Knowledge Graph can be used, in serialized form, to train an LLM directly.
This greatly reduces the cost of training.

What Have We To Learn?

Reduce tuning costs

The serialization of a knowledge graph greatly reduces the time and cost of preparing the tuning data.
However, the LLM still needs to be tuned. This tuning time can be costly.
We need to investigate other LLMs, ensuring they can be prompted and fine-tuned with a KG but at a lower cost.

2. Optimizing KG serialization

Other serializations need to be investigated, such as adding concise-bounded-descriptions or random graph walks
How should the graph and the subject-predicate-object statement, {g s p o}, be serialized into prompt:completions?
How would the graph be available in the answer, providing provenance?

3. Tokenization of IRIs

A key to RDF graphs is naming entity nodes (both graph, subject, and object) and predicate with a unique IRI.
An RDF IRI is commonly a URI but can be abbreviated to a qname.
In the ideal world, each of these equivalent identifiers would be tokenized to the same token
Unfortunately, the default tokenizer for GPT creates different and verbose tokens.

Perhaps pre-encoding the identifiers would work.

Where to Next?

Knowledge Graphs look like, IMHO, the perfect precursor to preparing LLM fine-tuning training from any structured data source.

The next step of this quest is to demonstrate path query Chain-of-Thought prompting in conjunction with a Knowledge Graph fine-tuned LLM.

Better Programming

Large Language Model = Knowledge Graph Store? Yes, by Fine-Tuning LLM With KG

Learn how to get GPT to ask complex questions in any language

Fine-Tuning an LLM With a Knowledge Graph

… of the Flowsheet Model

…Convincing the LLM With More Epochs

… of a Kennedys Model

Questions That Span the LLM’s and the Graph’s Knowledge

Asking a Fine-Tuned LLM … With Chain-of-Thought Prompts

Testing Chain-of-Thought Prompting

What Have We Learnt?

What Have We To Learn?

Where to Next?

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Better Programming

Written by Peter Lawrence, answering users' data questions

Responses (4)