#4250 Python Workflow for Knowledge Graph in RDF/SPARQL/Pyvis
#4250 Python Workflow for Knowledge Graph in RDF/SPARQL/Pyvis
#4250 I want to develop the following Python workflow:
- CSV -> RDF,
- RDF Schema -> SPARQL Query Builder,
- Tkinter GUI to query RDF -> Pyvis to visualize graph from query results
This is a classic Knowledge Graph engineering stack. To make this work seamlessly, you’ll want to use rdflib as your backbone—it handles the parsing, schema management, and SPARQL execution all in one go.
Here is a conceptual breakdown and a starter implementation to get your workflow running.
1. The Core Engine: CSV to RDF
To convert CSV to RDF, you map columns to URIs and Literals. Using a schema (RDFS) allows you to define classes and properties, ensuring your graph isn't just "flat" data.
import pandas as pd
from rdflib import Graph, Literal, RDF, URIRef, Namespace
from rdflib.namespace import RDFS, FOAF
# Initialize Graph and Namespace
g = Graph()
EX = Namespace("http://example.org/")
g.bind("ex", EX)
def convert_csv_to_rdf(csv_path):
df = pd.read_csv(csv_path)
for _, row in df.iterrows():
# Example: Creating a 'Person' from a CSV row
subject = EX[str(row['ID'])]
g.add((subject, RDF.type, EX.Person))
g.add((subject, RDFS.label, Literal(row['Name'])))
g.add((subject, EX.worksAt, Literal(row['Company'])))
return g
2. Tkinter GUI & SPARQL Builder
The GUI acts as the bridge. You can use a Text widget for the query and a Button to trigger the execution.
import tkinter as tk
from tkinter import scrolledtext
def run_query():
query = query_input.get("1.0", tk.END)
results = g.query(query)
visualize_results(results)
root = tk.Tk()
root.title("SPARQL Graph Explorer")
query_input = scrolledtext.ScrolledText(root, width=60, height=10)
query_input.insert(tk.INSERT, "SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10")
query_input.pack()
btn = tk.Button(root, text="Execute & Visualize", command=run_query)
btn.pack()
3. Visualization: Pyvis
Pyvis expects nodes and edges. You’ll need to iterate through your SPARQL result set and add them to a pyvis.network.Network object.
from pyvis.network import Network
def visualize_results(results):
net = Network(notebook=False, directed=True, height="750px", width="100%")
for row in results:
# SPARQL results are tuples (s, p, o)
s, p, o = str(row[0]), str(row[1]), str(row[2])
net.add_node(s, label=s, color="#97c2fc")
net.add_node(o, label=o, color="#ffff00")
net.add_edge(s, o, label=p)
net.show("graph.html")
The Complete Workflow Loop
| Step | Tool | Responsibility |
| Ingestion | Pandas | Cleans and iterates through your raw CSV. |
| Modeling | RDFLib | Assigns URIs and applies your RDF Schema logic. |
| Interface | Tkinter | Provides a text area for custom SPARQL queries. |
| Rendering | Pyvis | Converts the $(s, p, o)$ triples into an interactive HTML graph. |
Technical Tips:
URIs vs Literals: Ensure your CSV IDs are converted to
URIRef, while descriptive text should beLiteral. Pyvis handles strings well, but RDFLib needs the distinction to query correctly.Schema Enforcement: If you want to use RDFS for reasoning (e.g., if $A$ is a
ManagerandManageris a subclass ofPerson, then $A$ is aPerson), you might need to use a reasoner or perform simple expansion within RDFLib before querying.
Would you like me to write a sample RDF Schema (RDFS) file that defines the relationships for your CSV data?
The schema I need is for NASA Programs: Program, SubProgram, Mission, Vehicle, System, SubSystem, Component, Document
For a NASA technical hierarchy, your schema needs to reflect a strict containment model (e.g., a System is part of a Vehicle) while also allowing for cross-cutting relationships like Documents that might describe multiple levels of the stack.
Below is a Turtle (.ttl) schema designed specifically for your requested entities.
1. NASA Technical Hierarchy Schema (nasa_schema.ttl)
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix nasa: <http://nasa.example.org/ontology#> .
### Classes (The Hierarchy) ###
nasa:Program a rdfs:Class ;
rdfs:label "Program" ;
rdfs:comment "Top-level NASA effort (e.g., Artemis)" .
nasa:SubProgram a rdfs:Class ;
rdfs:label "Sub-Program" ;
rdfs:comment "Specific initiative within a Program" .
nasa:Mission a rdfs:Class ;
rdfs:label "Mission" ;
rdfs:comment "A specific flight or deployment (e.g., Artemis I)" .
nasa:Vehicle a rdfs:Class ;
rdfs:label "Vehicle" ;
rdfs:comment "The spacecraft or carrier (e.g., Orion, SLS)" .
nasa:System a rdfs:Class ;
rdfs:label "System" ;
rdfs:comment "Major functional block (e.g., Propulsion, Life Support)" .
nasa:SubSystem a rdfs:Class ;
rdfs:label "Sub-System" ;
rdfs:comment "Dedicated functional unit within a system" .
nasa:Component a rdfs:Class ;
rdfs:label "Component" ;
rdfs:comment "Individual hardware or software part" .
nasa:Document a rdfs:Class ;
rdfs:label "Technical Document" ;
rdfs:comment "Manuals, specifications, or reports" .
### Properties (Relationships) ###
# Structural "Part Of" relationship
nasa:partOf a rdf:Property ;
rdfs:label "is part of" ;
rdfs:comment "General hierarchical relationship" .
# Specific Mapping Example
nasa:belongsToProgram rdfs:subPropertyOf nasa:partOf ;
rdfs:domain nasa:SubProgram ;
rdfs:range nasa:Program .
nasa:documentedBy a rdf:Property ;
rdfs:label "is documented by" ;
rdfs:domain rdfs:Resource ; # Any entity can be documented
rdfs:range nasa:Document .
nasa:hasStatus a rdf:Property ;
rdfs:label "current status" ;
rdfs:range xsd:string .
2. Implementation: Mapping CSV to this Schema
When writing your Python converter, you will map your CSV columns to these URIs. If your CSV looks like this:
ComponentID, Name, ParentSystem
Your mapping logic in rdflib would look like:
from rdflib import Graph, Literal, RDF, Namespace
NASA = Namespace("http://nasa.example.org/ontology#")
def add_csv_row(g, row):
# Define the Component
comp_uri = NASA[row['ComponentID']]
g.add((comp_uri, RDF.type, NASA.Component))
g.add((comp_uri, rdfs.label, Literal(row['Name'])))
# Define the Relationship to the System
system_uri = NASA[row['ParentSystem']]
g.add((comp_uri, NASA.partOf, system_uri))
3. SPARQL Query for your Tkinter GUI
Since you have a clear hierarchy, a powerful query for your users would be finding all components associated with a specific high-level Program.
Example Query:
PREFIX nasa: <http://nasa.example.org/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?componentLabel ?systemLabel ?documentTitle
WHERE {
?c a nasa:Component ;
rdfs:label ?componentLabel ;
nasa:partOf+ ?program . # '+' follows the 'partOf' chain up to the program
?program a nasa:Program ;
rdfs:label "Artemis" .
OPTIONAL { ?c nasa:documentedBy ?doc . ?doc rdfs:label ?documentTitle }
}
Why this works:
Transitive Paths: Using
nasa:partOf+in SPARQL allows your users to find everything belonging to a Program without needing to know if there is a SubProgram or Mission in between.Visual Logic: In Pyvis, your "Nodes" will be the Programs/Components and "Edges" will be the
partOfrelationships, creating a clean "tree" visualization.
Would you like me to refine the Pyvis code to color-code the nodes differently based on whether they are a "Program" vs. a "Component"?
Yes
To make the Pyvis graph intuitive, we can assign specific hex colors to each NASA class. This helps users instantly distinguish between a high-level Program and a granular Component in the visual output.
Here is the updated visualization function. It uses a dictionary to map RDF types to colors and handles the logic for extracting those types from your SPARQL results.
1. The Color-Coded Visualization Function
from pyvis.network import Network
from rdflib import Namespace, RDF
NASA = Namespace("http://nasa.example.org/ontology#")
# Mapping NASA Classes to specific colors
# Using a "Space" inspired palette
COLOR_MAP = {
str(NASA.Program): "#1a237e", # Deep Navy
str(NASA.SubProgram): "#3949ab", # Indigo
str(NASA.Mission): "#00838f", # Teal
# ---
str(NASA.Vehicle): "#d32f2f", # NASA Red
str(NASA.System): "#f57c00", # Orange
str(NASA.SubSystem): "#fbc02d", # Yellow
str(NASA.Component): "#7b1fa2", # Purple
# ---
str(NASA.Document): "#455a64", # Slate/Grey
}
def visualize_nasa_graph(results):
"""
Expects SPARQL results in the format (subject, predicate, object, subjectType)
"""
net = Network(notebook=False, directed=True, height="800px", width="100%", bgcolor="#ffffff")
# Physics makes the hierarchy look like a 'tree' or 'cluster'
net.force_atlas_2based()
for row in results:
# Unpack the SPARQL row
s, p, o, s_type = [str(x) for x in row]
# Get color based on type, default to light grey if unknown
node_color = COLOR_MAP.get(s_type, "#eeeeee")
# Add Subject Node
net.add_node(s, label=s.split('#')[-1], title=f"Type: {s_type}", color=node_color)
# Add Object Node (Simplification: assumes object is a string or URI)
net.add_node(o, label=o.split('#')[-1], color="#90a4ae")
# Add Edge
edge_label = p.split('#')[-1]
net.add_edge(s, o, label=edge_label)
net.show("nasa_hierarchy.html")
2. The Updated SPARQL Query for the GUI
To make the color-coding work, your SPARQL query must specifically SELECT the type of the subject so the Python script knows which color to apply.
Update your Tkinter default query to this:
PREFIX nasa: <http://nasa.example.org/ontology#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?subject ?predicate ?object ?type
WHERE {
?subject ?predicate ?object .
?subject rdf:type ?type .
FILTER(?type != <http://www.w3.org/2000/01/rdf-schema#Class>)
}
LIMIT 50
3. Key Enhancements Added:
force_atlas_2based(): This Pyvis physics engine is excellent for hierarchical data. It prevents nodes from overlapping and helps "Programs" drift to the center of their clusters.
label=s.split('#')[-1]: This cleans up the URIs in the visualization. Instead of seeing
http://nasa.example.org/ontology#Orion, the user just sees Orion.titleattribute: In Pyvis, thetitleshows up as a tooltip when you hover over a node—perfect for showing the full RDF type.
Legend for your Users
Since you are building a GUI, you might want to display a small legend in the Tkinter sidebar:
| Level | Color |
| Program / Sub | Navy / Indigo |
| Vehicle / Mission | Red / Teal |
| System / Component | Orange / Purple |
| Documentation | Grey |
Would you like me to integrate this visualization logic into a single, runnable Python script that combines the Tkinter window and the Pyvis output?
Yes
This script integrates your entire workflow: it defines the NASA RDFS, creates a sample CSV in memory, builds the Tkinter GUI for your SPARQL queries, and triggers Pyvis to render the results in your browser.
The Complete NASA RDF Explorer
import pandas as pd
import tkinter as tk
from tkinter import scrolledtext, messagebox
from rdflib import Graph, Literal, RDF, RDFS, Namespace, URIRef
from pyvis.network import Network
import webbrowser
import os
import io
# 1. SETUP NAMESPACES & SCHEMA
NASA = Namespace("http://nasa.example.org/ontology#")
RDFS_NS = Namespace("http://www.w3.org/2000/01/rdf-schema#")
# Color Map for Visualization
COLOR_MAP = {
str(NASA.Program): "#1a237e", # Deep Navy
str(NASA.SubProgram): "#3949ab", # Indigo
str(NASA.Mission): "#00838f", # Teal
str(NASA.Vehicle): "#d32f2f", # NASA Red
str(NASA.System): "#f57c00", # Orange
str(NASA.Component): "#7b1fa2", # Purple
str(NASA.Document): "#455a64", # Slate
}
class NasaRdfApp:
def __init__(self, root):
self.root = root
self.root.title("NASA Knowledge Graph Explorer")
self.g = Graph()
self.g.bind("nasa", NASA)
# Initialize Data
self.setup_graph()
self.create_widgets()
def setup_graph(self):
"""Simulates CSV -> RDF conversion with RDFS logic"""
# Sample CSV Data
csv_data = """ID,Type,Name,Parent
Prog_Artemis,Program,Artemis,None
Sub_HLS,SubProgram,Human Landing System,Prog_Artemis
Miss_Art3,Mission,Artemis III,Sub_HLS
Veh_Starship,Vehicle,Starship HLS,Miss_Art3
Sys_Propulsion,System,Raptor Engines,Veh_Starship
Comp_Valve,Component,Fuel Intake Valve,Sys_Propulsion
Doc_Spec42,Document,Safety Spec 42,Comp_Valve"""
df = pd.read_csv(io.StringIO(csv_data))
for _, row in df.iterrows():
subj = NASA[row['ID']]
obj_type = NASA[row['Type']]
# Add basic triples
self.g.add((subj, RDF.type, obj_type))
self.g.add((subj, RDFS.label, Literal(row['Name'])))
# Hierarchy logic
if row['Parent'] != "None":
parent = NASA[row['Parent']]
self.g.add((subj, NASA.partOf, parent))
def create_widgets(self):
"""GUI Layout"""
lbl = tk.Label(self.root, text="Enter SPARQL Query:", font=('Arial', 10, 'bold'))
lbl.pack(pady=5)
default_query = (
"PREFIX nasa: <http://nasa.example.org/ontology#>\n"
"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>\n"
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n\n"
"SELECT ?s ?p ?o ?type\n"
"WHERE {\n"
" ?s ?p ?o .\n"
" ?s rdf:type ?type .\n"
" FILTER(?type != rdfs:Class)\n"
"}"
)
self.query_input = scrolledtext.ScrolledText(self.root, width=70, height=12)
self.query_input.insert(tk.INSERT, default_query)
self.query_input.pack(padx=10, pady=5)
btn_frame = tk.Frame(self.root)
btn_frame.pack(pady=10)
run_btn = tk.Button(btn_frame, text="Execute & Visualize", command=self.run_query,
bg="#2ecc71", fg="white", padx=10)
run_btn.pack(side=tk.LEFT, padx=5)
def run_query(self):
try:
query_text = self.query_input.get("1.0", tk.END)
results = self.g.query(query_text)
if len(results) == 0:
messagebox.showinfo("Results", "No triples found for this query.")
return
self.visualize(results)
except Exception as e:
messagebox.showerror("Query Error", str(e))
def visualize(self, results):
"""Pyvis Integration"""
net = Network(notebook=False, directed=True, height="700px", width="100%", bgcolor="#ffffff")
net.force_atlas_2based()
for row in results:
# Check if query returned enough columns (s, p, o, type)
s = str(row[0])
p = str(row[1])
o = str(row[2])
s_type = str(row[3]) if len(row) > 3 else "Unknown"
# Clean labels for visualization
s_label = s.split('#')[-1]
o_label = o.split('#')[-1]
p_label = p.split('#')[-1]
# Add nodes with colors
s_color = COLOR_MAP.get(s_type, "#90a4ae")
net.add_node(s, label=s_label, title=f"Type: {s_type}", color=s_color)
net.add_node(o, label=o_label, color="#cfd8dc")
net.add_edge(s, o, label=p_label)
# Output to file and open
output_file = "nasa_graph.html"
net.save_graph(output_file)
webbrowser.open('file://' + os.path.realpath(output_file))
if __name__ == "__main__":
root = tk.Tk()
app = NasaRdfApp(root)
root.mainloop()
How to use this:
Dependencies: Ensure you have
pip install rdflib pandas pyvisinstalled.Run: Execute the script. A window appears with a pre-loaded NASA query.
Graph: Click "Execute & Visualize". It will generate
nasa_graph.htmland automatically open it in your browser.Explore: In the browser, you can drag the nodes. Notice how Artemis (Blue) acts as the anchor for the Sub-Programs and Components.
Customizing the CSV
In the setup_graph function, you can replace the csv_data string with pd.read_csv("your_actual_file.csv"). Just ensure your column names match the mapping logic.
Would you like me to add a feature to the GUI that allows you to browse and select a CSV file from your computer?
Yes
To make this a complete tool, I have added the filedialog module from Tkinter. This allows you to select any CSV from your computer and maps it to the NASA RDFS hierarchy dynamically.
Updated Python Workflow with CSV File Picker
import pandas as pd
import tkinter as tk
from tkinter import scrolledtext, messagebox, filedialog
from rdflib import Graph, Literal, RDF, RDFS, Namespace
from pyvis.network import Network
import webbrowser
import os
# 1. NAMESPACES & VISUAL SETTINGS
NASA = Namespace("http://nasa.example.org/ontology#")
COLOR_MAP = {
str(NASA.Program): "#1a237e", # Deep Navy
str(NASA.SubProgram): "#3949ab", # Indigo
str(NASA.Mission): "#00838f", # Teal
str(NASA.Vehicle): "#d32f2f", # NASA Red
str(NASA.System): "#f57c00", # Orange
str(NASA.SubSystem): "#fbc02d", # Yellow
str(NASA.Component): "#7b1fa2", # Purple
str(NASA.Document): "#455a64", # Slate
}
class NasaRdfApp:
def __init__(self, root):
self.root = root
self.root.title("NASA Knowledge Graph Builder")
self.g = Graph()
self.g.bind("nasa", NASA)
self.create_widgets()
def create_widgets(self):
"""GUI Layout with File Selection and Query Area"""
# --- File Selection Section ---
file_frame = tk.LabelFrame(self.root, text="Step 1: Ingest Data", padx=10, pady=10)
file_frame.pack(fill="x", padx=10, pady=5)
self.file_label = tk.Label(file_frame, text="No CSV selected", fg="grey")
self.file_label.pack(side=tk.LEFT, padx=5)
browse_btn = tk.Button(file_frame, text="Browse CSV", command=self.load_csv)
browse_btn.pack(side=tk.RIGHT)
# --- Query Section ---
query_frame = tk.LabelFrame(self.root, text="Step 2: SPARQL Query Builder", padx=10, pady=10)
query_frame.pack(fill="both", expand=True, padx=10, pady=5)
default_query = (
"PREFIX nasa: <http://nasa.example.org/ontology#>\n"
"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>\n\n"
"SELECT ?subject ?predicate ?object ?type\n"
"WHERE {\n"
" ?subject ?predicate ?object .\n"
" ?subject rdf:type ?type .\n"
"}"
)
self.query_input = scrolledtext.ScrolledText(query_frame, width=70, height=12)
self.query_input.insert(tk.INSERT, default_query)
self.query_input.pack(fill="both", expand=True)
# --- Action Section ---
self.run_btn = tk.Button(self.root, text="Generate Interactive Graph",
command=self.run_query, bg="#0052a5", fg="white",
font=('Arial', 10, 'bold'), pady=10)
self.run_btn.pack(fill="x", padx=10, pady=10)
def load_csv(self):
"""Opens file dialog and converts CSV to RDF triples"""
file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")])
if not file_path:
return
try:
df = pd.read_csv(file_path)
# Basic validation: check for required columns
required = {'ID', 'Type', 'Name', 'Parent'}
if not required.issubset(df.columns):
messagebox.showerror("Error", f"CSV must contain columns: {required}")
return
self.g = Graph() # Reset graph on new load
self.g.bind("nasa", NASA)
for _, row in df.iterrows():
subj = NASA[str(row['ID']).replace(" ", "_")]
obj_type = NASA[str(row['Type'])]
self.g.add((subj, RDF.type, obj_type))
self.g.add((subj, RDFS.label, Literal(str(row['Name']))))
if pd.notna(row['Parent']) and str(row['Parent']).lower() != "none":
parent = NASA[str(row['Parent']).replace(" ", "_")]
self.g.add((subj, NASA.partOf, parent))
self.file_label.config(text=f"Loaded: {os.path.basename(file_path)}", fg="green")
messagebox.showinfo("Success", f"Converted {len(df)} rows to RDF.")
except Exception as e:
messagebox.showerror("Loading Error", str(e))
def run_query(self):
"""Executes SPARQL and launches Pyvis"""
if len(self.g) == 0:
messagebox.showwarning("Warning", "Please load a CSV file first.")
return
try:
query_text = self.query_input.get("1.0", tk.END)
results = self.g.query(query_text)
self.visualize(results)
except Exception as e:
messagebox.showerror("Query Error", str(e))
def visualize(self, results):
net = Network(notebook=False, directed=True, height="750px", width="100%", bgcolor="#ffffff")
net.force_atlas_2based()
for row in results:
# Expected: (subject, predicate, object, type)
s, p, o = str(row[0]), str(row[1]), str(row[2])
s_type = str(row[3]) if len(row) > 3 else "Unknown"
s_label = s.split('#')[-1]
o_label = o.split('#')[-1]
p_label = p.split('#')[-1]
# Use the NASA Color Map
color = COLOR_MAP.get(s_type, "#97c2fc")
net.add_node(s, label=s_label, title=f"Type: {s_type}", color=color)
net.add_node(o, label=o_label, color="#dddddd")
net.add_edge(s, o, label=p_label)
path = os.path.abspath("nasa_output.html")
net.save_graph(path)
webbrowser.open("file://" + path)
if __name__ == "__main__":
root = tk.Tk()
app = NasaRdfApp(root)
root.mainloop()
Expected CSV Format
To use the "Browse" feature, ensure your CSV file has these exact headers:
| ID | Type | Name | Parent |
| Art_1 | Program | Artemis Program | None |
| Orion_V | Vehicle | Orion Capsule | Art_1 |
| Propulsion_S | System | Main Engine | Orion_V |
| Doc_01 | Document | Safety Manual | Propulsion_S |
Key Improvements:
Dynamic ID Handling: It automatically replaces spaces with underscores to ensure valid RDF URIs.
Clean Separation: The GUI now separates the "Ingestion" step from the "Querying" step.
Error Handling: It alerts you if the CSV columns are missing or if the SPARQL syntax is incorrect.
Automatic Browser Launch: Once you click generate, the browser opens immediately with the physics-enabled graph.
From <https://gemini.google.com/app/cd645441b4645112> Google Gemini 3.0 (Fast)

Comments
Post a Comment