#4194 Cypher to Pyvis; +RDF/SPARQL
#4194 Cypher to Pyvis; +RDF/SPARQL
#4194 Provide a python application to convert a Cypher script to Pyvis script
Converting a Cypher script (used in Neo4j) into a Pyvis visualization script is a great way to move from raw data to an interactive web-based graph.
Since Cypher is just a query language, we first need to parse the script to extract the entities (Nodes) and the relationships (Edges). The following Python application uses re (regular expressions) to parse standard CREATE or MERGE statements and generates a standalone Python file that uses Pyvis.
The Cypher-to-Pyvis Converter
This script reads a .cypher file, identifies the patterns, and writes a .py file that, when run, generates an interactive HTML graph.
import re
def cypher_to_pyvis(input_file, output_file):
with open(input_file, 'r') as f:
cypher_text = f.read()
# Regex patterns for nodes and relationships
# Matches (name:Label {attr: 'val'})
node_pattern = r'\((\w+):(\w+)(?:\s*\{.*?\})?\)'
# Matches (a)-[:REL]->(b)
edge_pattern = r'\((\w+)\)-\[:(\w+)\]->\((\w+)\)'
nodes = set(re.findall(node_pattern, cypher_text))
edges = re.findall(edge_pattern, cypher_text)
# Building the Python script content
script_content = [
"from pyvis.network import Network",
"import pandas as pd",
"",
"net = Network(height='750px', width='100%', bgcolor='#222222', font_color='white', directed=True)",
"net.barnes_hut()",
""
]
# Add Nodes
script_content.append("# Adding Nodes")
for name, label in nodes:
script_content.append(f"net.add_node('{name}', label='{name}', title='Type: {label}')")
# Add Edges
script_content.append("\n# Adding Edges")
for source, rel_type, target in edges:
script_content.append(f"net.add_edge('{source}', '{target}', title='{rel_type}', label='{rel_type}')")
script_content.append("\n# Generate and open the graph")
script_content.append("net.show('graph.html', notebook=False)")
with open(output_file, 'w') as f:
f.write("\n".join(script_content))
print(f"Successfully converted {input_file} to {output_file}")
# Example Usage:
# cypher_to_pyvis('data.cypher', 'visualize_graph.py')
How to use it
Prepare your Cypher script (
data.cypher):CREATE (Alice:Person) CREATE (Bob:Person) CREATE (Alice)-[:FRIEND]->(Bob) CREATE (Python:Language) CREATE (Alice)-[:WORKS_WITH]->(Python)Run the Converter: Call the function
cypher_to_pyvis('data.cypher', 'visualize_graph.py').Run the Generated Script:
pip install pyvis python visualize_graph.py
Key Considerations
Regex Limitations: This parser handles standard
CREATEstatements. If your Cypher script is highly complex (e.g., usingUNWIND, complexCASEstatements, or deep nested properties), a simple regex might miss some nuances.Attributes: The current script extracts labels but doesn't map every property (like
age: 30) to the Pyvis tooltip. You can extend the regex to capture the{}block if you need those details.Physics: Pyvis uses a physics engine by default. In the generated script, I’ve included
net.barnes_hut(), which is usually the best for medium-sized datasets.
Would you like me to modify the script to handle specific node properties or to use different colors based on the node labels?
Repeat for Cypher to RDX
To convert Cypher (Neo4j's query language) to RDX, it's important to clarify the target. In the context of graph data and Python, RDX usually refers to Redux-style state objects or, more commonly in data science, RDF (Resource Description Framework) data formats like Turtle or XML.
Since you are working with graph structures, converting Cypher to RDF (Turtle) is the standard way to bridge the gap between labeled property graphs and the semantic web.
Cypher-to-RDF (Turtle) Converter
This script parses a Cypher CREATE statement and converts it into a valid Turtle (.ttl) file, which is the standard "RDX" format for linked data.
import re
def cypher_to_rdf(input_file, output_file):
with open(input_file, 'r') as f:
cypher_text = f.read()
# Regex to find (name:Label) and (a)-[:REL]->(b)
node_pattern = r'\((\w+):(\w+)\)'
edge_pattern = r'\((\w+)\)-\[:(\w+)\]->\((\w+)\)'
nodes = re.findall(node_pattern, cypher_text)
edges = re.findall(edge_pattern, cypher_text)
rdf_content = [
"@prefix ex: <http://example.org/> .",
"@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .",
"@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .",
""
]
# Convert Nodes to RDF Types
rdf_content.append("# Node Definitions")
for name, label in nodes:
rdf_content.append(f"ex:{name} rdf:type ex:{label} .")
rdf_content.append(f"ex:{name} rdfs:label \"{name}\" .")
# Convert Edges to RDF Predicates
rdf_content.append("\n# Relationship Definitions")
for source, rel_type, target in edges:
rdf_content.append(f"ex:{source} ex:{rel_type} ex:{target} .")
with open(output_file, 'w') as f:
f.write("\n".join(rdf_content))
print(f"Successfully converted {input_file} to {output_file} (RDF Turtle format)")
# Example Usage
# cypher_to_rdf('data.cypher', 'data.ttl')
Key Differences in Logic
When moving from Cypher to RDF/RDX, the data model shifts slightly:
| Feature | Cypher (Property Graph) | RDF/RDX (Triples) |
| Structure | Nodes and Relationships | Subject - Predicate - Object |
| Labels | :Person is a label on a node | rdf:type links a subject to a class |
| Properties | Key-value pairs inside nodes | Additional triples (Subject - HasProperty - Value) |
| Namespaces | Local to the database | Uses URIs (like http://example.org/) |
How to use the output
The generated .ttl file can be used in several ways:
Graph Databases: Import it into GraphDB, Apache Jena, or Stardog.
Python Processing: Use the
rdfliblibrary to query the data using SPARQL.Visualization: Many RDF browsers can render this as a network.
Note: If by "RDX" you specifically meant a proprietary format (like a specific Redux state schema or a different data engine), please let me know the specific structure you need, and I can adjust the mapping logic!
Would you like me to add support for converting Cypher properties (like {age: 30}) into RDF data literals?
Yes
To handle properties like {age: 30, city: 'London'} in Cypher, we need to add a parser for the bracketed metadata. In RDF, these become Data Properties (literals), whereas relationships are Object Properties.
I've updated the script to extract those curly-bracketed attributes and map them into the Turtle (.ttl) format.
Enhanced Cypher-to-RDF Converter (with Properties)
import re
import json
def cypher_to_rdf_pro(input_file, output_file):
with open(input_file, 'r') as f:
cypher_text = f.read()
# Pattern for Nodes with optional properties: (id:Label {key: 'val', key2: 123})
node_pattern = r'\((\w+):(\w+)(?:\s*\{(.*?)\})?\)'
# Pattern for Relationships: (a)-[:REL]->(b)
edge_pattern = r'\((\w+)\)-\[:(\w+)\]->\((\w+)\)'
nodes = re.finditer(node_pattern, cypher_text)
edges = re.findall(edge_pattern, cypher_text)
rdf_content = [
"@prefix ex: <http://example.org/resource/> .",
"@prefix prop: <http://example.org/property/> .",
"@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .",
"@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .",
""
]
# Process Nodes & Properties
rdf_content.append("# --- Node Definitions & Properties ---")
for match in nodes:
node_id, label, props_raw = match.groups()
# Define the Node Type
rdf_content.append(f"ex:{node_id} rdf:type ex:{label} .")
# Parse Properties
if props_raw:
# Simple split by comma, then split by colon
pairs = props_raw.split(',')
for pair in pairs:
if ':' in pair:
key, val = pair.split(':', 1)
key = key.strip()
val = val.strip().strip("'").strip('"')
# Basic Type Detection (Integer vs String)
if val.isdigit():
rdf_content.append(f"ex:{node_id} prop:{key} {val} .") # Treated as xsd:integer
else:
rdf_content.append(f"ex:{node_id} prop:{key} \"{val}\" .") # Treated as string
# Process Relationships
rdf_content.append("\n# --- Relationship Definitions ---")
for source, rel_type, target in edges:
rdf_content.append(f"ex:{source} prop:{rel_type} ex:{target} .")
with open(output_file, 'w') as f:
f.write("\n".join(rdf_content))
print(f"Success! Converted with properties to {output_file}")
# Example Run:
# cypher_to_rdf_pro('input.cypher', 'output.ttl')
How the Mapping Works
If your Cypher input looks like this:
CREATE (u1:User {name: 'Alice', age: 30})
CREATE (u1)-[:FOLLOWS]->(u2)
The resulting RDF (Turtle) output will look like this:
ex:u1 rdf:type ex:User .
ex:u1 prop:name "Alice" .
ex:u1 prop:age 30 .
ex:u1 prop:FOLLOWS ex:u2 .
Key Improvements in this Version:
Literal Handling: It distinguishes between numbers (no quotes) and strings (quotes) so RDF triplestores can index them correctly.
Namespace Separation: I’ve added a
prop:prefix to distinguish between the entity itself and its attributes/relationships.Regex Iteration: Used
re.finditerto ensure it captures every node definition even if they are on the same line.
Would you like me to add a step to automatically upload this generated file to a local RDF triplestore using a library like rdflib?
Yes
To integrate the conversion process with a local RDF triplestore, we will use rdflib. This library is the industry standard for Python; it allows you to create an in-memory store, query it with SPARQL, or persist it to a local file (like Sleepycat or BerkeleyDB).
Python Application: Cypher to rdflib Triplestore
This script converts the Cypher script, loads it into an rdflib Graph object, and then demonstrates how to run a SPARQL query against your newly created data.
import re
from rdflib import Graph, Literal, RDF, URIRef, Namespace
from rdflib.namespace import XSD
def cypher_to_rdflib(cypher_file):
# 1. Initialize the Graph and Namespaces
g = Graph()
EX = Namespace("http://example.org/resource/")
PROP = Namespace("http://example.org/property/")
g.bind("ex", EX)
g.bind("prop", PROP)
with open(cypher_file, 'r') as f:
cypher_text = f.read()
# 2. Extract Nodes and Properties
node_pattern = r'\((\w+):(\w+)(?:\s*\{(.*?)\})?\)'
for match in re.finditer(node_pattern, cypher_text):
node_id, label, props_raw = match.groups()
subject = EX[node_id]
# Set the RDF Class (Label)
g.add((subject, RDF.type, EX[label]))
# Parse and Add Properties
if props_raw:
pairs = props_raw.split(',')
for pair in pairs:
if ':' in pair:
key, val = [item.strip() for item in pair.split(':', 1)]
val_clean = val.strip("'").strip('"')
# DataType Handling
if val_clean.isdigit():
g.add((subject, PROP[key], Literal(int(val_clean), datatype=XSD.integer)))
else:
g.add((subject, PROP[key], Literal(val_clean)))
# 3. Extract and Add Relationships
edge_pattern = r'\((\w+)\)-\[:(\w+)\]->\((\w+)\)'
for source, rel_type, target in re.findall(edge_pattern, cypher_text):
g.add((EX[source], PROP[rel_type], EX[target]))
return g
# --- Main Execution ---
if __name__ == "__main__":
# Create a dummy cypher file for testing
with open("test.cypher", "w") as f:
f.write("CREATE (p1:Person {name: 'Alice', age: 30})\n")
f.write("CREATE (p2:Person {name: 'Bob'})\n")
f.write("CREATE (p1)-[:KNOWS]->(p2)")
# Convert and Load
my_graph = cypher_to_rdflib("test.cypher")
# 4. Run a SPARQL Query to verify
print("--- Querying the Triplestore ---")
query = """
SELECT ?name ?age WHERE {
?person rdf:type ex:Person .
?person prop:name ?name .
OPTIONAL { ?person prop:age ?age . }
}
"""
for row in my_graph.query(query):
print(f"Found Person: {row.name}, Age: {row.age if row.age else 'N/A'}")
# 5. Save/Export the store
my_graph.serialize(destination="output_store.ttl", format="turtle")
print("\nTriplestore persisted to 'output_store.ttl'")
Why use rdflib for this?
Strict Validation: Unlike a text file,
rdflibensures that your URIs and Literals are correctly formatted for the Semantic Web.SPARQL Ready: You can immediately perform complex graph traversals using standard SPARQL syntax.
Interoperability: You can export the resulting graph to multiple formats including
xml,json-ld, orn-triplessimply by changing theformatargument in the.serialize()method.
Setup
You will need to install the library first:
pip install rdflib
Would you like me to write a SPARQL query that mimics a specific Cypher MATCH pattern for your new triplestore?
Yes
To bridge the gap between Cypher and SPARQL, it's helpful to see how "Path Finding" and "Pattern Matching" translate. Cypher is very concise with its arrow syntax ()->[]->(), while SPARQL is based on Triples (Subject-Predicate-Object).
Below is a comparison and a Python implementation to execute these patterns against your rdflib triplestore.
Pattern Comparison
| Feature | Cypher Pattern | SPARQL Equivalent |
| Simple Match | (p:Person {name: 'Alice'}) | ?p rdf:type ex:Person . ?p prop:name "Alice" . |
| Relationship | (a)-[:KNOWS]->(b) | ?a prop:KNOWS ?b . |
| Variable Path | (a)-[:FRIEND*1..2]->(b) | ?a (prop:FRIEND/prop:FRIEND?) ?b . |
| Filtering | WHERE p.age > 25 | FILTER (?age > 25) |
The SPARQL Execution Script
This script uses the rdflib graph created in the previous step and runs a query that mimics a standard Cypher "Friends of Friends" or "Related Entities" search.
from rdflib import Graph, Namespace
# 1. Load the graph (assuming it was saved as 'output_store.ttl')
g = Graph()
g.parse("output_store.ttl", format="turtle")
# 2. Define Namespaces for the query
PREFIXES = """
PREFIX ex: <http://example.org/resource/>
PREFIX prop: <http://example.org/property/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
"""
# 3. Mimic Cypher: MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(friend)-[:WORKS_AT]->(co)
# RETURN a.name, friend.name, co.name
complex_match_query = PREFIXES + """
SELECT ?aliceName ?friendName ?companyName
WHERE {
?alice prop:name "Alice" .
?alice prop:KNOWS ?friend .
?friend prop:name ?friendName .
# Nested relationship
OPTIONAL {
?friend prop:WORKS_AT ?company .
?company prop:name ?companyName .
}
}
"""
print("--- Mimicking Cypher MATCH Result ---")
results = g.query(complex_match_query)
if len(results) == 0:
print("No matches found for that specific pattern.")
else:
for row in results:
company = row.companyName if row.companyName else "Unemployed/Unknown"
print(f"Alice knows {row.friendName}, who works at {company}.")
# 4. Mimic Cypher: MATCH (p:Person) WHERE p.age > 20 RETURN p.name, p.age
filter_query = PREFIXES + """
SELECT ?name ?age
WHERE {
?p rdf:type ex:Person .
?p prop:name ?name .
?p prop:age ?age .
FILTER (?age > 20)
}
ORDER BY DESC(?age)
"""
print("\n--- Filtered Results (Age > 20) ---")
for row in g.query(filter_query):
print(f"Name: {row.name}, Age: {row.age}")
Pro-Tip: Property Paths
One of the coolest things about SPARQL that mimics Cypher's variable-length paths is the Property Path syntax.
If you want to find everyone Alice is connected to by any relationship, regardless of depth (like (a)-[*]->(b)), you can use:
SELECT ?connectedTo
WHERE {
ex:Alice (prop:|!prop:)* ?connectedTo .
}
Would you like me to create a "Recursive Search" script that finds all nodes connected to a specific starting point within N-steps?
From <https://gemini.google.com/app/> Google Gemini 3.0 (Fast)

Comments
Post a Comment