Kuzu
Kรนzu is an in-process property graph database management system.
This notebook shows how to use LLMs to provide a natural language interface to Kรนzu database with
Cyphergraph query language.Cypher is a declarative graph query language that allows for expressive and efficient data querying in a property graph.
Setting upโ
Install the python package:
pip install kuzu
Create a database on the local machine and connect to it:
import kuzu
db = kuzu.Database("test_db")
conn = kuzu.Connection(db)
First, we create the schema for a simple movie database:
conn.execute("CREATE NODE TABLE Movie (name STRING, PRIMARY KEY(name))")
conn.execute(
"CREATE NODE TABLE Person (name STRING, birthDate STRING, PRIMARY KEY(name))"
)
conn.execute("CREATE REL TABLE ActedIn (FROM Person TO Movie)")
<kuzu.query_result.QueryResult at 0x1066ff410>
Then we can insert some data.
conn.execute("CREATE (:Person {name: 'Al Pacino', birthDate: '1940-04-25'})")
conn.execute("CREATE (:Person {name: 'Robert De Niro', birthDate: '1943-08-17'})")
conn.execute("CREATE (:Movie {name: 'The Godfather'})")
conn.execute("CREATE (:Movie {name: 'The Godfather: Part II'})")
conn.execute(
"CREATE (:Movie {name: 'The Godfather Coda: The Death of Michael Corleone'})"
)
conn.execute(
"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather' CREATE (p)-[:ActedIn]->(m)"
)
conn.execute(
"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)"
)
conn.execute(
"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather Coda: The Death of Michael Corleone' CREATE (p)-[:ActedIn]->(m)"
)
conn.execute(
"MATCH (p:Person), (m:Movie) WHERE p.name = 'Robert De Niro' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)"
)
<kuzu.query_result.QueryResult at 0x107016210>
Creating KuzuQAChainโ
We can now create the KuzuGraph and KuzuQAChain. To create the KuzuGraph we simply need to pass the database object to the KuzuGraph constructor.
from langchain.chains import KuzuQAChain
from langchain_community.graphs import KuzuGraph
from langchain_openai import ChatOpenAI
API Reference:
graph = KuzuGraph(db)
chain = KuzuQAChain.from_llm(ChatOpenAI(temperature=0), graph=graph, verbose=True)
Refresh graph schema informationโ
If the schema of database changes, you can refresh the schema information needed to generate Cypher statements.
# graph.refresh_schema()
print(graph.get_schema)
Node properties: [{'properties': [('name', 'STRING')], 'label': 'Movie'}, {'properties': [('name', 'STRING'), ('birthDate', 'STRING')], 'label': 'Person'}]
Relationships properties: [{'properties': [], 'label': 'ActedIn'}]
Relationships: ['(:Person)-[:ActedIn]->(:Movie)']
Querying the graphโ
We can now use the KuzuQAChain to ask question of the graph
chain.run("Who played in The Godfather: Part II?")
[1m> Entering new chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:ActedIn]->(m:Movie {name: 'The Godfather: Part II'}) RETURN p.name[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Al Pacino'}, {'p.name': 'Robert De Niro'}][0m
[1m> Finished chain.[0m
'Al Pacino and Robert De Niro both played in The Godfather: Part II.'
chain.run("Robert De Niro played in which movies?")
[1m> Entering new chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person {name: 'Robert De Niro'})-[:ActedIn]->(m:Movie)
RETURN m.name[0m
Full Context:
[32;1m[1;3m[{'m.name': 'The Godfather: Part II'}][0m
[1m> Finished chain.[0m
'Robert De Niro played in The Godfather: Part II.'
chain.run("Robert De Niro is born in which year?")
[1m> Entering new chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person {name: 'Robert De Niro'})-[:ActedIn]->(m:Movie)
RETURN p.birthDate[0m
Full Context:
[32;1m[1;3m[{'p.birthDate': '1943-08-17'}][0m
[1m> Finished chain.[0m
'Robert De Niro was born on August 17, 1943.'
chain.run("Who is the oldest actor who played in The Godfather: Part II?")
[1m> Entering new chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:ActedIn]->(m:Movie{name:'The Godfather: Part II'})
WITH p, m, p.birthDate AS birthDate
ORDER BY birthDate ASC
LIMIT 1
RETURN p.name[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Al Pacino'}][0m
[1m> Finished chain.[0m
'The oldest actor who played in The Godfather: Part II is Al Pacino.'