#4194 Cypher to Pyvis; +RDF/SPARQL

#4194 Cypher to Pyvis; +RDF/SPARQL




#4194 Provide a python application to convert a Cypher script to Pyvis script

Converting a Cypher script (used in Neo4j) into a Pyvis visualization script is a great way to move from raw data to an interactive web-based graph.

Since Cypher is just a query language, we first need to parse the script to extract the entities (Nodes) and the relationships (Edges). The following Python application uses re (regular expressions) to parse standard CREATE or MERGE statements and generates a standalone Python file that uses Pyvis.


The Cypher-to-Pyvis Converter

This script reads a .cypher file, identifies the patterns, and writes a .py file that, when run, generates an interactive HTML graph.

import re

def cypher_to_pyvis(input_file, output_file):
    with open(input_file, 'r') as f:
        cypher_text = f.read()

    # Regex patterns for nodes and relationships
    # Matches (name:Label {attr: 'val'})
    node_pattern = r'\((\w+):(\w+)(?:\s*\{.*?\})?\)'
    # Matches (a)-[:REL]->(b)
    edge_pattern = r'\((\w+)\)-\[:(\w+)\]->\((\w+)\)'

    nodes = set(re.findall(node_pattern, cypher_text))
    edges = re.findall(edge_pattern, cypher_text)

    # Building the Python script content
    script_content = [
        "from pyvis.network import Network",
        "import pandas as pd",
        "",
        "net = Network(height='750px', width='100%', bgcolor='#222222', font_color='white', directed=True)",
        "net.barnes_hut()",
        ""
    ]

    # Add Nodes
    script_content.append("# Adding Nodes")
    for name, label in nodes:
        script_content.append(f"net.add_node('{name}', label='{name}', title='Type: {label}')")

    # Add Edges
    script_content.append("\n# Adding Edges")
    for source, rel_type, target in edges:
        script_content.append(f"net.add_edge('{source}', '{target}', title='{rel_type}', label='{rel_type}')")

    script_content.append("\n# Generate and open the graph")
    script_content.append("net.show('graph.html', notebook=False)")

    with open(output_file, 'w') as f:
        f.write("\n".join(script_content))

    print(f"Successfully converted {input_file} to {output_file}")

# Example Usage:
# cypher_to_pyvis('data.cypher', 'visualize_graph.py')

How to use it

  1. Prepare your Cypher script (data.cypher):

    CREATE (Alice:Person)
    CREATE (Bob:Person)
    CREATE (Alice)-[:FRIEND]->(Bob)
    CREATE (Python:Language)
    CREATE (Alice)-[:WORKS_WITH]->(Python)
    
  2. Run the Converter: Call the function cypher_to_pyvis('data.cypher', 'visualize_graph.py').

  3. Run the Generated Script:

    pip install pyvis
    python visualize_graph.py
    

Key Considerations

  • Regex Limitations: This parser handles standard CREATE statements. If your Cypher script is highly complex (e.g., using UNWIND, complex CASE statements, or deep nested properties), a simple regex might miss some nuances.

  • Attributes: The current script extracts labels but doesn't map every property (like age: 30) to the Pyvis tooltip. You can extend the regex to capture the {} block if you need those details.

  • Physics: Pyvis uses a physics engine by default. In the generated script, I’ve included net.barnes_hut(), which is usually the best for medium-sized datasets.

Would you like me to modify the script to handle specific node properties or to use different colors based on the node labels?


You said

Repeat for Cypher to RDX

To convert Cypher (Neo4j's query language) to RDX, it's important to clarify the target. In the context of graph data and Python, RDX usually refers to Redux-style state objects or, more commonly in data science, RDF (Resource Description Framework) data formats like Turtle or XML.

Since you are working with graph structures, converting Cypher to RDF (Turtle) is the standard way to bridge the gap between labeled property graphs and the semantic web.


Cypher-to-RDF (Turtle) Converter

This script parses a Cypher CREATE statement and converts it into a valid Turtle (.ttl) file, which is the standard "RDX" format for linked data.

import re

def cypher_to_rdf(input_file, output_file):
    with open(input_file, 'r') as f:
        cypher_text = f.read()

    # Regex to find (name:Label) and (a)-[:REL]->(b)
    node_pattern = r'\((\w+):(\w+)\)'
    edge_pattern = r'\((\w+)\)-\[:(\w+)\]->\((\w+)\)'

    nodes = re.findall(node_pattern, cypher_text)
    edges = re.findall(edge_pattern, cypher_text)

    rdf_content = [
        "@prefix ex: <http://example.org/> .",
        "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .",
        "@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .",
        ""
    ]

    # Convert Nodes to RDF Types
    rdf_content.append("# Node Definitions")
    for name, label in nodes:
        rdf_content.append(f"ex:{name} rdf:type ex:{label} .")
        rdf_content.append(f"ex:{name} rdfs:label \"{name}\" .")

    # Convert Edges to RDF Predicates
    rdf_content.append("\n# Relationship Definitions")
    for source, rel_type, target in edges:
        rdf_content.append(f"ex:{source} ex:{rel_type} ex:{target} .")

    with open(output_file, 'w') as f:
        f.write("\n".join(rdf_content))

    print(f"Successfully converted {input_file} to {output_file} (RDF Turtle format)")

# Example Usage
# cypher_to_rdf('data.cypher', 'data.ttl')

Key Differences in Logic

When moving from Cypher to RDF/RDX, the data model shifts slightly:

FeatureCypher (Property Graph)RDF/RDX (Triples)
StructureNodes and RelationshipsSubject - Predicate - Object
Labels:Person is a label on a noderdf:type links a subject to a class
PropertiesKey-value pairs inside nodesAdditional triples (Subject - HasProperty - Value)
NamespacesLocal to the databaseUses URIs (like http://example.org/)

How to use the output

The generated .ttl file can be used in several ways:

  1. Graph Databases: Import it into GraphDB, Apache Jena, or Stardog.

  2. Python Processing: Use the rdflib library to query the data using SPARQL.

  3. Visualization: Many RDF browsers can render this as a network.

Note: If by "RDX" you specifically meant a proprietary format (like a specific Redux state schema or a different data engine), please let me know the specific structure you need, and I can adjust the mapping logic!

Would you like me to add support for converting Cypher properties (like {age: 30}) into RDF data literals?


You said

Yes

To handle properties like {age: 30, city: 'London'} in Cypher, we need to add a parser for the bracketed metadata. In RDF, these become Data Properties (literals), whereas relationships are Object Properties.

I've updated the script to extract those curly-bracketed attributes and map them into the Turtle (.ttl) format.

Enhanced Cypher-to-RDF Converter (with Properties)

import re
import json

def cypher_to_rdf_pro(input_file, output_file):
    with open(input_file, 'r') as f:
        cypher_text = f.read()

    # Pattern for Nodes with optional properties: (id:Label {key: 'val', key2: 123})
    node_pattern = r'\((\w+):(\w+)(?:\s*\{(.*?)\})?\)'
    # Pattern for Relationships: (a)-[:REL]->(b)
    edge_pattern = r'\((\w+)\)-\[:(\w+)\]->\((\w+)\)'

    nodes = re.finditer(node_pattern, cypher_text)
    edges = re.findall(edge_pattern, cypher_text)

    rdf_content = [
        "@prefix ex: <http://example.org/resource/> .",
        "@prefix prop: <http://example.org/property/> .",
        "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .",
        "@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .",
        ""
    ]

    # Process Nodes & Properties
    rdf_content.append("# --- Node Definitions & Properties ---")
    for match in nodes:
        node_id, label, props_raw = match.groups()
        
        # Define the Node Type
        rdf_content.append(f"ex:{node_id} rdf:type ex:{label} .")
        
        # Parse Properties
        if props_raw:
            # Simple split by comma, then split by colon
            pairs = props_raw.split(',')
            for pair in pairs:
                if ':' in pair:
                    key, val = pair.split(':', 1)
                    key = key.strip()
                    val = val.strip().strip("'").strip('"')
                    
                    # Basic Type Detection (Integer vs String)
                    if val.isdigit():
                        rdf_content.append(f"ex:{node_id} prop:{key} {val} .") # Treated as xsd:integer
                    else:
                        rdf_content.append(f"ex:{node_id} prop:{key} \"{val}\" .") # Treated as string

    # Process Relationships
    rdf_content.append("\n# --- Relationship Definitions ---")
    for source, rel_type, target in edges:
        rdf_content.append(f"ex:{source} prop:{rel_type} ex:{target} .")

    with open(output_file, 'w') as f:
        f.write("\n".join(rdf_content))

    print(f"Success! Converted with properties to {output_file}")

# Example Run:
# cypher_to_rdf_pro('input.cypher', 'output.ttl')

How the Mapping Works

If your Cypher input looks like this:

CREATE (u1:User {name: 'Alice', age: 30})
CREATE (u1)-[:FOLLOWS]->(u2)

The resulting RDF (Turtle) output will look like this:

ex:u1 rdf:type ex:User .
ex:u1 prop:name "Alice" .
ex:u1 prop:age 30 .
ex:u1 prop:FOLLOWS ex:u2 .

Key Improvements in this Version:

  1. Literal Handling: It distinguishes between numbers (no quotes) and strings (quotes) so RDF triplestores can index them correctly.

  2. Namespace Separation: I’ve added a prop: prefix to distinguish between the entity itself and its attributes/relationships.

  3. Regex Iteration: Used re.finditer to ensure it captures every node definition even if they are on the same line.

Would you like me to add a step to automatically upload this generated file to a local RDF triplestore using a library like rdflib?


Yes

To integrate the conversion process with a local RDF triplestore, we will use rdflib. This library is the industry standard for Python; it allows you to create an in-memory store, query it with SPARQL, or persist it to a local file (like Sleepycat or BerkeleyDB).

Python Application: Cypher to rdflib Triplestore

This script converts the Cypher script, loads it into an rdflib Graph object, and then demonstrates how to run a SPARQL query against your newly created data.

import re
from rdflib import Graph, Literal, RDF, URIRef, Namespace
from rdflib.namespace import XSD

def cypher_to_rdflib(cypher_file):
    # 1. Initialize the Graph and Namespaces
    g = Graph()
    EX = Namespace("http://example.org/resource/")
    PROP = Namespace("http://example.org/property/")
    g.bind("ex", EX)
    g.bind("prop", PROP)

    with open(cypher_file, 'r') as f:
        cypher_text = f.read()

    # 2. Extract Nodes and Properties
    node_pattern = r'\((\w+):(\w+)(?:\s*\{(.*?)\})?\)'
    for match in re.finditer(node_pattern, cypher_text):
        node_id, label, props_raw = match.groups()
        
        subject = EX[node_id]
        # Set the RDF Class (Label)
        g.add((subject, RDF.type, EX[label]))
        
        # Parse and Add Properties
        if props_raw:
            pairs = props_raw.split(',')
            for pair in pairs:
                if ':' in pair:
                    key, val = [item.strip() for item in pair.split(':', 1)]
                    val_clean = val.strip("'").strip('"')
                    
                    # DataType Handling
                    if val_clean.isdigit():
                        g.add((subject, PROP[key], Literal(int(val_clean), datatype=XSD.integer)))
                    else:
                        g.add((subject, PROP[key], Literal(val_clean)))

    # 3. Extract and Add Relationships
    edge_pattern = r'\((\w+)\)-\[:(\w+)\]->\((\w+)\)'
    for source, rel_type, target in re.findall(edge_pattern, cypher_text):
        g.add((EX[source], PROP[rel_type], EX[target]))

    return g

# --- Main Execution ---
if __name__ == "__main__":
    # Create a dummy cypher file for testing
    with open("test.cypher", "w") as f:
        f.write("CREATE (p1:Person {name: 'Alice', age: 30})\n")
        f.write("CREATE (p2:Person {name: 'Bob'})\n")
        f.write("CREATE (p1)-[:KNOWS]->(p2)")

    # Convert and Load
    my_graph = cypher_to_rdflib("test.cypher")

    # 4. Run a SPARQL Query to verify
    print("--- Querying the Triplestore ---")
    query = """
    SELECT ?name ?age WHERE {
        ?person rdf:type ex:Person .
        ?person prop:name ?name .
        OPTIONAL { ?person prop:age ?age . }
    }
    """
    
    for row in my_graph.query(query):
        print(f"Found Person: {row.name}, Age: {row.age if row.age else 'N/A'}")

    # 5. Save/Export the store
    my_graph.serialize(destination="output_store.ttl", format="turtle")
    print("\nTriplestore persisted to 'output_store.ttl'")

Why use rdflib for this?

  1. Strict Validation: Unlike a text file, rdflib ensures that your URIs and Literals are correctly formatted for the Semantic Web.

  2. SPARQL Ready: You can immediately perform complex graph traversals using standard SPARQL syntax.

  3. Interoperability: You can export the resulting graph to multiple formats including xml, json-ld, or n-triples simply by changing the format argument in the .serialize() method.

Setup

You will need to install the library first:

pip install rdflib

Would you like me to write a SPARQL query that mimics a specific Cypher MATCH pattern for your new triplestore?


Yes

To bridge the gap between Cypher and SPARQL, it's helpful to see how "Path Finding" and "Pattern Matching" translate. Cypher is very concise with its arrow syntax ()->[]->(), while SPARQL is based on Triples (Subject-Predicate-Object).

Below is a comparison and a Python implementation to execute these patterns against your rdflib triplestore.


Pattern Comparison

FeatureCypher PatternSPARQL Equivalent
Simple Match(p:Person {name: 'Alice'})?p rdf:type ex:Person . ?p prop:name "Alice" .
Relationship(a)-[:KNOWS]->(b)?a prop:KNOWS ?b .
Variable Path(a)-[:FRIEND*1..2]->(b)?a (prop:FRIEND/prop:FRIEND?) ?b .
FilteringWHERE p.age > 25FILTER (?age > 25)

The SPARQL Execution Script

This script uses the rdflib graph created in the previous step and runs a query that mimics a standard Cypher "Friends of Friends" or "Related Entities" search.

from rdflib import Graph, Namespace

# 1. Load the graph (assuming it was saved as 'output_store.ttl')
g = Graph()
g.parse("output_store.ttl", format="turtle")

# 2. Define Namespaces for the query
PREFIXES = """
PREFIX ex: <http://example.org/resource/>
PREFIX prop: <http://example.org/property/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
"""

# 3. Mimic Cypher: MATCH (a:Person {name: 'Alice'})-[:KNOWS]->(friend)-[:WORKS_AT]->(co)
#    RETURN a.name, friend.name, co.name
complex_match_query = PREFIXES + """
SELECT ?aliceName ?friendName ?companyName
WHERE {
    ?alice prop:name "Alice" .
    ?alice prop:KNOWS ?friend .
    ?friend prop:name ?friendName .
    
    # Nested relationship
    OPTIONAL {
        ?friend prop:WORKS_AT ?company .
        ?company prop:name ?companyName .
    }
}
"""

print("--- Mimicking Cypher MATCH Result ---")
results = g.query(complex_match_query)

if len(results) == 0:
    print("No matches found for that specific pattern.")
else:
    for row in results:
        company = row.companyName if row.companyName else "Unemployed/Unknown"
        print(f"Alice knows {row.friendName}, who works at {company}.")

# 4. Mimic Cypher: MATCH (p:Person) WHERE p.age > 20 RETURN p.name, p.age
filter_query = PREFIXES + """
SELECT ?name ?age
WHERE {
    ?p rdf:type ex:Person .
    ?p prop:name ?name .
    ?p prop:age ?age .
    FILTER (?age > 20)
}
ORDER BY DESC(?age)
"""

print("\n--- Filtered Results (Age > 20) ---")
for row in g.query(filter_query):
    print(f"Name: {row.name}, Age: {row.age}")

Pro-Tip: Property Paths

One of the coolest things about SPARQL that mimics Cypher's variable-length paths is the Property Path syntax.

If you want to find everyone Alice is connected to by any relationship, regardless of depth (like (a)-[*]->(b)), you can use:

SELECT ?connectedTo
WHERE {
    ex:Alice (prop:|!prop:)* ?connectedTo .
}

Would you like me to create a "Recursive Search" script that finds all nodes connected to a specific starting point within N-steps?


From <https://gemini.google.com/app/>  Google Gemini 3.0 (Fast)


Comments

Popular posts from this blog

#2892 Example of ChatGPT o1 Prompt

#3105 Federal and State Spending Idiocy