#4253 CSV (ID, Attribute, Value) -> RDF Triples
#4253 CSV (ID, Attribute, Value) -> RDF Triples
#4253 Provide a Python application to read a CSV file with ID, Attribute, Value and create RDF triples, then display the RDF data
To build this application, we’ll use the rdflib library, which is the industry standard for handling RDF in Python. We will map your CSV columns to a basic RDF structure where the ID acts as the subject, the Attribute as the predicate, and the Value as the object.
1. Prerequisites
You’ll need to install rdflib if you haven't already:
pip install rdflib
2. The Python Application
This script handles the CSV parsing, creates a namespace for your data, and serializes the output into the human-readable Turtle format.
import csv
from rdflib import Graph, Literal, RDF, URIRef, Namespace
def csv_to_rdf(file_path):
# 1. Initialize the Graph and a custom Namespace
g = Graph()
EX = Namespace("http://example.org/data/")
try:
with open(file_path, mode='r', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
# 2. Define Subject, Predicate, and Object
# We sanitize the strings to ensure they are valid URIs
subject = EX[row['ID'].strip()]
predicate = EX[row['Attribute'].strip().replace(" ", "_")]
value = row['Value'].strip()
# 3. Add the triple to the graph
# If the value looks like a number, rdflib handles the type mapping
g.add((subject, predicate, Literal(value)))
# 4. Display the RDF data
print("--- Generated RDF Data (Turtle Format) ---")
print(g.serialize(format="turtle"))
except FileNotFoundError:
print(f"Error: The file '{file_path}' was not found.")
except KeyError as e:
print(f"Error: Missing column in CSV: {e}")
# Example Usage
if __name__ == "__main__":
# Create a dummy CSV for demonstration if it doesn't exist
csv_filename = 'data.csv'
with open(csv_filename, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(["ID", "Attribute", "Value"])
writer.writerow(["User01", "name", "Alice"])
writer.writerow(["User01", "age", "30"])
writer.writerow(["User02", "name", "Bob"])
csv_to_rdf(csv_filename)
How it Works
Namespaces: Since RDF requires URIs, we use
http://example.org/data/as a prefix.User01becomes<http://example.org/data/User01>.The Triple: Each row represents a statement: (Subject) --> (Predicate) --> (Object).
Serialization: While RDF can be XML or JSON-LD, Turtle (
.ttl) is the most popular for humans because it’s concise and groups attributes by subject.
Suggested CSV Structure
Ensure your CSV looks like this for the script to run smoothly:
| ID | Attribute | Value |
| Item_1 | color | Red |
| Item_1 | weight | 5kg |
| Item_2 | color | Blue |
Would you like me to modify the script to save the output to a .ttl file or perhaps use a specific ontology like Schema.org?
Yes
Since you’re working with NASA technical hierarchies and traceability, using a structured vocabulary like Schema.org—combined with a custom namespace for your specific programs and missions—is a great way to ensure your knowledge graph is both standardized and extensible.
The following script updates the previous logic to:
Map attributes to Schema.org (e.g., mapping "name" to
schema:name).Handle technical hierarchy IDs as unique URI resources.
Save the output directly to a
.ttl(Turtle) file for use in SPARQL queries or Pyvis visualizations.
Python Application: CSV to Schema.org RDF
import csv
from rdflib import Graph, Literal, RDF, URIRef, Namespace
from rdflib.namespace import SCHEMA, RDFS
def csv_to_technical_rdf(csv_input, output_file):
g = Graph()
# Define custom namespace for NASA Programs/Missions
# Replace this with your actual project URI
NASA = Namespace("https://nasa.example.gov/ontology/")
g.bind("nasa", NASA)
g.bind("schema", SCHEMA)
# Simple mapping dictionary for common attributes to Schema.org
# You can expand this based on your CSV 'Attribute' column
attr_mapping = {
"name": SCHEMA.name,
"description": SCHEMA.description,
"identifier": SCHEMA.identifier,
"version": SCHEMA.version,
"status": SCHEMA.creativeWorkStatus,
"partOf": SCHEMA.isPartOf # Useful for SubProgram -> Program hierarchy
}
try:
with open(csv_input, mode='r', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
# 1. Subject: The specific ID (e.g., 'Artemis_I')
subj_id = row['ID'].strip().replace(" ", "_")
subject = NASA[subj_id]
# 2. Predicate: Check mapping, else create a custom NASA property
attr_raw = row['Attribute'].strip()
predicate = attr_mapping.get(attr_raw.lower())
if not predicate:
# Fallback to custom property if not in Schema.org mapping
predicate = NASA[attr_raw.replace(" ", "_")]
# 3. Object: The value (e.g., 'In Progress' or 'Orion Capsule')
obj_val = row['Value'].strip()
# Check if the value is actually another ID (for traceability)
if obj_val.startswith("ID_"):
object_ref = NASA[obj_val.replace("ID_", "")]
g.add((subject, predicate, object_ref))
else:
g.add((subject, predicate, Literal(obj_val)))
# Save to file
g.serialize(destination=output_file, format="turtle")
print(f"Successfully saved RDF data to: {output_file}")
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == "__main__":
# Example CSV generation
input_csv = 'technical_data.csv'
with open(input_csv, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(["ID", "Attribute", "Value"])
writer.writerow(["Artemis_Program", "name", "Artemis Lunar Exploration"])
writer.writerow(["Orion_MPC", "partOf", "ID_Artemis_Program"]) # Links two entities
writer.writerow(["Orion_MPC", "status", "Active"])
csv_to_technical_rdf(input_csv, 'knowledge_graph.ttl')
Key Enhancements for Your Use Case
Object Linking: In the logic above, I added a check (
obj_val.startswith("ID_")). This allows you to link entities (e.g., a Vehicle to a Mission) rather than just treating everything as text. This is critical for SPARQL traceability.Schema.org Integration: By using
SCHEMA.nameorSCHEMA.isPartOf, your data becomes compatible with global web standards, which is helpful if you ever integrate with external datasets.Turtle (.ttl): This format is ideal for your Pyvis visualizations because it is lightweight and easy to parse back into a graph structure.
Next Steps
Would you like me to show you how to load this .ttl file into a Pyvis network to visualize the hierarchy, or would you prefer a SPARQL query to test the traceability between these IDs?
Both, .ttl to Pyvis network, and SPARQL query
To bridge the gap between your raw data and a visual knowledge graph, we’ll use rdflib to run the SPARQL query and pyvis to generate an interactive HTML map.
1. The SPARQL Query
This query will find every subject ($s$), the relationship ($p$), and the object ($o$) in your graph. It’s the standard "get everything" query used to feed a visualization.
from rdflib import Graph
def query_knowledge_graph(ttl_file):
g = Graph()
g.parse(ttl_file, format="turtle")
# This query finds all triples and filters out the 'type' declarations
# if you want to focus strictly on your custom NASA hierarchy.
query = """
SELECT ?subject ?predicate ?object
WHERE {
?subject ?predicate ?object .
}
"""
print(f"--- Querying Results from {ttl_file} ---")
results = g.query(query)
for row in results:
# Simplify the URIs for cleaner console printing
s = row.subject.split('/')[-1]
p = row.predicate.split('/')[-1]
o = row.object.split('/')[-1] if hasattr(row.object, 'toPython') else row.object
print(f"ID: {s} --[{p}]--> {o}")
return results
# Run query
# query_knowledge_graph('knowledge_graph.ttl')
2. The Pyvis Visualization
This script reads your .ttl file, runs the SPARQL query, and maps the results into a Physics-enabled interactive network.
from pyvis.network import Network
from rdflib import Graph
def visualize_rdf_to_pyvis(ttl_file, html_output="hierarchy_map.html"):
# 1. Load the RDF Graph
g = Graph()
g.parse(ttl_file, format="turtle")
# 2. Initialize Pyvis Network
# '1000px' height, '100%' width, dark background
net = Network(height="750px", width="100%", bgcolor="#222222", font_color="white", directed=True)
# 3. Extract triples using SPARQL for the visualization
query = "SELECT ?s ?p ?o WHERE { ?s ?p ?o }"
for row in g.query(query):
# Extract the short name from the URI for the Label
subj = str(row.s).split('/')[-1]
pred = str(row.p).split('/')[-1]
obj = str(row.o).split('/')[-1]
# Add nodes with NASA-themed colors
net.add_node(subj, label=subj, color="#005288", title=f"Type: {subj}") # NASA Blue
net.add_node(obj, label=obj, color="#FF3D00", title=f"Value: {obj}") # Safety Orange
# Add the edge (the relationship)
net.add_edge(subj, obj, label=pred, color="#FFFFFF")
# 4. Set physics for a nice "Technical Hierarchy" layout
net.force_atlas_2based()
# 5. Save and open
net.show(html_output, notebook=False)
print(f"Visualization generated: {html_output}")
# Execute
if __name__ == "__main__":
# Ensure you have the .ttl file from the previous step
visualize_rdf_to_pyvis('knowledge_graph.ttl')
3. Requirements
Make sure you have pyvis installed:
pip install pyvis rdflib
How to use this for Technical Traceability
SPARQL for Logic: Use the SPARQL block when you need to answer specific questions, like "What sub-systems are part of the Orion Mission?"
Pyvis for Context: Open the generated
hierarchy_map.htmlin your browser. You can click and drag nodes (like Artemis_I) to see how they are connected to various Documents or Vehicles.
Would you like me to write a more specific SPARQL query that filters for "traceability"—for example, finding all items connected to a specific Program ID?
Yes
To handle technical traceability, we need a SPARQL query that can "walk" the hierarchy. For example, if you want to find every mission, vehicle, or document that eventually traces back to a specific Program ID, we use a Property Path.
In SPARQL, the + operator (e.g., schema:isPartOf+) tells the engine to follow that relationship as many levels deep as it goes.
1. The Traceability SPARQL Query
This script identifies everything "downstream" from a specific top-level Program.
from rdflib import Graph, Namespace
from rdflib.namespace import SCHEMA
def trace_program_hierarchy(ttl_file, program_id):
g = Graph()
g.parse(ttl_file, format="turtle")
# Define our custom NASA namespace to match your previous data
NASA = Namespace("https://nasa.example.gov/ontology/")
# This query uses a Property Path (isPartOf+) to find all descendants
# It identifies any item 'x' that is part of the program, or part of something that is part of the program.
query = f"""
PREFIX schema: <http://schema.org/>
PREFIX nasa: <https://nasa.example.gov/ontology/>
SELECT ?descendant ?relationship ?parent
WHERE {{
?descendant schema:isPartOf+ nasa:{program_id} .
?descendant ?relationship ?parent .
FILTER(?parent = nasa:{program_id} || EXISTS {{ ?parent schema:isPartOf+ nasa:{program_id} }})
}}
"""
print(f"--- Traceability Report for: {program_id} ---")
results = g.query(query)
if not results:
print("No descendants found for this ID.")
for row in results:
child = row.descendant.split('/')[-1]
rel = row.relationship.split('/')[-1]
parent = row.parent.split('/')[-1]
print(f"[{child}] --({rel})--> [{parent}]")
# Example Usage:
# trace_program_hierarchy('knowledge_graph.ttl', 'Artemis_Program')
2. Integrated Visualizer with Filtering
We can combine the SPARQL query with Pyvis so that you only visualize the specific hierarchy you are interested in. This prevents the graph from getting "cluttered" with unrelated data.
from pyvis.network import Network
from rdflib import Graph, Namespace
def visualize_traceability(ttl_file, root_id):
g = Graph()
g.parse(ttl_file, format="turtle")
NASA = Namespace("https://nasa.example.gov/ontology/")
net = Network(height="800px", width="100%", bgcolor="#0b0d0f", font_color="white", directed=True)
# SPARQL: Find the root and all its recursive descendants
query = f"""
PREFIX schema: <http://schema.org/>
PREFIX nasa: <https://nasa.example.gov/ontology/>
SELECT DISTINCT ?s ?p ?o
WHERE {{
{{
# Direct and indirect descendants
?s schema:isPartOf+ nasa:{root_id} .
?s ?p ?o .
}}
UNION
{{
# Include the root itself and its immediate properties
BIND(nasa:{root_id} AS ?s)
?s ?p ?o .
}}
}}
"""
for row in g.query(query):
s_label = str(row.s).split('/')[-1]
p_label = str(row.p).split('/')[-1]
o_label = str(row.o).split('/')[-1]
# Coloring Logic: Root is Gold, Sub-elements are NASA Blue
s_color = "#FFD700" if s_label == root_id else "#005288"
net.add_node(s_label, label=s_label, color=s_color, size=25 if s_label == root_id else 15)
net.add_node(o_label, label=o_label, color="#A5A5A5")
net.add_edge(s_label, o_label, label=p_label)
# Use Hierarchical layout for technical traceability (Tree view)
net.set_options("""
var options = {
"layout": {
"hierarchical": {
"enabled": true,
"direction": "UD",
"sortMethod": "directed"
}
}
}
""")
net.show(f"trace_{root_id}.html", notebook=False)
print(f"Traceability graph for {root_id} generated.")
# Run for a specific hierarchy
# visualize_traceability('knowledge_graph.ttl', 'Artemis_Program')
Why this works for Technical Hierarchies:
Property Paths (
+): This is the "magic" in SPARQL for traceability. It ensures that if a Document is part of a Vehicle, and a Vehicle is part of a Mission, and a Mission is part of a Program, the Document correctly traces all the way back to the Program.Hierarchical Layout: In the Pyvis section, I enabled
hierarchical: { "direction": "UD" }. This forces the graph into an Up-Down tree structure, which is much easier to read for NASA-style program breakdowns than a random "blob" of nodes.
Would you like me to add a "Search" feature to the Python script so you can input an ID and instantly generate its specific traceability map?
Yes
This final version integrates the CSV processing, the SPARQL traceability logic, and the Pyvis visualization into a single, interactive CLI tool. You can type in any ID from your CSV (like a Program or Mission), and it will instantly generate a filtered, hierarchical map showing everything connected to it.
The Traceability Search Application
import os
import csv
from rdflib import Graph, Literal, Namespace, URIRef
from rdflib.namespace import SCHEMA
from pyvis.network import Network
# 1. Setup Namespaces
NASA = Namespace("https://nasa.example.gov/ontology/")
BASE_SCHEMA = SCHEMA
def load_data_to_graph(csv_path):
"""Parses CSV into an RDF Graph with technical hierarchy logic."""
g = Graph()
g.bind("nasa", NASA)
g.bind("schema", BASE_SCHEMA)
with open(csv_path, mode='r', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
subj = NASA[row['ID'].strip()]
attr = row['Attribute'].strip().lower()
val = row['Value'].strip()
# Mapping 'partof' to Schema.org for traceability
if attr == 'partof':
# Link to another resource
obj = NASA[val.replace("ID_", "")]
g.add((subj, SCHEMA.isPartOf, obj))
else:
# Standard attribute
pred = NASA[attr.replace(" ", "_")]
g.add((subj, pred, Literal(val)))
return g
def generate_visual_trace(g, root_id):
"""Uses SPARQL to find descendants and creates a Hierarchical Pyvis map."""
net = Network(height="800px", width="100%", bgcolor="#1a1a1a", font_color="white", directed=True)
# SPARQL: Trace all connections related to the searched ID
query = f"""
PREFIX schema: <http://schema.org/>
PREFIX nasa: <https://nasa.example.gov/ontology/>
SELECT DISTINCT ?s ?p ?o WHERE {{
{{
# Downstream descendants
?s schema:isPartOf+ nasa:{root_id} .
?s ?p ?o .
}} UNION {{
# The search target itself
BIND(nasa:{root_id} AS ?s)
?s ?p ?o .
}}
}}
"""
results = list(g.query(query))
if not results:
return False
for row in results:
s_label = str(row.s).split('/')[-1]
p_label = str(row.p).split('/')[-1]
o_label = str(row.o).split('/')[-1]
# Style: The searched Root is Gold, everything else is NASA Blue
s_color = "#FFD700" if s_label == root_id else "#005288"
net.add_node(s_label, label=s_label, color=s_color, size=30 if s_label == root_id else 15)
net.add_node(o_label, label=o_label, color="#7fb3d5")
net.add_edge(s_label, o_label, label=p_label, color="#ffffff")
# Set Up-Down Hierarchical Layout
net.set_options("""
var options = {
"layout": { "hierarchical": { "enabled": true, "direction": "UD", "sortMethod": "directed" } },
"physics": { "enabled": false }
}
""")
output_path = f"trace_{root_id}.html"
net.save_graph(output_path)
return output_path
def main():
csv_file = 'technical_data.csv'
if not os.path.exists(csv_file):
print(f"File {csv_file} not found. Please ensure your CSV is ready.")
return
print("--- NASA Technical Hierarchy Loader ---")
graph = load_data_to_graph(csv_file)
print(f"Loaded {len(graph)} triples from {csv_file}.\n")
while True:
search_id = input("Enter an ID to trace (or 'exit' to quit): ").strip()
if search_id.lower() == 'exit':
break
file_generated = generate_visual_trace(graph, search_id)
if file_generated:
print(f"Successfully generated traceability map: {file_generated}")
print(f"Opening {file_generated} in your browser...")
os.startfile(file_generated) if os.name == 'nt' else os.system(f'open {file_generated}')
else:
print(f"ID '{search_id}' not found or has no hierarchical connections.")
if __name__ == "__main__":
main()
Features of the Search Tool
Recursive Traceability: If you search for a Program, it shows the Missions, Vehicles, and Documents linked to it via
isPartOf.Automatic Visualization: It saves and automatically opens the
.htmlfile in your default browser.Hierarchical View: Unlike a standard "ball and stick" graph, this uses a Top-Down tree layout, making it clear which systems belong to which parents.
Clean Labels: It strips away long URI prefixes so you only see the human-readable IDs in the diagram.
How to use it
Prepare your CSV: Use the "ID, Attribute, Value" format.
Link Items: To link a Vehicle to a Mission, set Attribute to
partOfand Value to the Mission ID (e.g.,Artemis_I).Run: Run the script and type in the ID you want to explore.
Would you like me to add a "Cross-Check" feature that highlights any IDs in your CSV that are missing their required 'Document' or 'Requirement' links?
Provide Python code to convert rdflib Graph to Pyvis Network
To convert an rdflib.Graph to a Pyvis Network, we iterate through the triples in the graph and add the subjects and objects as nodes, and the predicates as edges.
Since you are working with NASA technical hierarchies, I've added logic to handle both URIs (entities like Missions/Programs) and Literals (text descriptions/values) so the visualization stays clean.
The Conversion Function
from pyvis.network import Network
from rdflib import Graph, URIRef, Literal
def rdflib_to_pyvis(rdf_graph, output_html="knowledge_graph.html"):
"""
Converts an rdflib.Graph object into an interactive Pyvis Network.
"""
# 1. Initialize Pyvis Network (Dark theme for technical data)
net = Network(height="800px", width="100%", bgcolor="#1a1a1a", font_color="white", directed=True)
# 2. Iterate through every triple in the rdflib Graph
for s, p, o in rdf_graph:
# Simplify Subject (Extract local name from URI)
s_id = str(s).split('/')[-1] if isinstance(s, URIRef) else str(s)
# Simplify Predicate (The relationship name)
p_label = str(p).split('/')[-1] if isinstance(p, URIRef) else str(p)
# Simplify Object (Check if it's another URI or just a text value)
if isinstance(o, URIRef):
o_id = str(o).split('/')[-1]
o_color = "#005288" # NASA Blue for entities
o_size = 15
else:
o_id = str(o)
o_color = "#7fb3d5" # Lighter blue for literal values/text
o_size = 10
# 3. Add nodes and edges to the Network
# Subject Node (Main entity)
net.add_node(s_id, label=s_id, color="#FFD700", size=20) # Gold for subjects
# Object Node
net.add_node(o_id, label=o_id, color=o_color, size=o_size)
# Relationship Edge
net.add_edge(s_id, o_id, label=p_label, color="#ffffff")
# 4. Configure Physics for a "Technical Tree" layout
# This keeps the nodes from overlapping and creates a structured hierarchy
net.force_atlas_2based()
# 5. Save and return the file path
net.save_graph(output_html)
return output_html
# --- Example Usage ---
# g = Graph()
# g.parse("nasa_traceability.ttl", format="turtle")
# rdflib_to_pyvis(g, "nasa_viz.html")
Key Features of this Conversion
Type Differentiation: It checks
isinstance(o, URIRef). If the object is a URI (another entity), it gets a stronger color. If it's aLiteral(just a string like "Active" or "v1.0"), it gets a smaller, lighter node.URI Shortening: It automatically strips the long
https://...prefixes so your graph nodes show clean IDs like Artemis_I instead of the full web address.Dynamic Sizing: Subjects (the "roots" of the data) are slightly larger and colored gold to help you navigate the hierarchy.
How to customize the layout
If you want the graph to look like a strict organizational chart (Top-Down), you can replace net.force_atlas_2based() with:
net.set_options("""
var options = {
"layout": {
"hierarchical": {
"enabled": true,
"direction": "UD",
"sortMethod": "directed"
}
}
}
""")
Would you like me to add a "Click Listener" that displays more metadata about a NASA system when you select its node in the browser?
Yes
To add a "Click Listener" and display metadata in the browser, we use the title attribute in Pyvis. When you hover over or click a node, this attribute creates a tooltip.
For your NASA technical hierarchy, the most effective way to do this is to pre-aggregate all attributes for each ID into a single string. This way, when you hover over a "Mission" node, you see its Status, Version, and associated Documents all in one popup.
1. The Metadata-Enhanced Converter
This script uses a dictionary to gather all properties for an ID before building the Pyvis network.
from pyvis.network import Network
from rdflib import Graph, URIRef, Literal
import os
def visualize_with_metadata(rdf_graph, output_html="nasa_interactive_map.html"):
net = Network(height="800px", width="100%", bgcolor="#0b0d0f", font_color="white", directed=True)
# 1. Aggregate Metadata (Attributes) for each Subject
# We want a dictionary: { "Artemis_I": "Status: Active\nVersion: 1.2\n..." }
metadata = {}
for s, p, o in rdf_graph:
s_id = str(s).split('/')[-1]
p_label = str(p).split('/')[-1]
o_val = str(o).split('/')[-1] if isinstance(o, URIRef) else str(o)
# We only want to show Literals (attributes) in the tooltip, not the links
if not isinstance(o, URIRef):
if s_id not in metadata:
metadata[s_id] = []
metadata[s_id].append(f"<b>{p_label}</b>: {o_val}")
# 2. Build the Network
for s, p, o in rdf_graph:
s_id = str(s).split('/')[-1]
p_label = str(p).split('/')[-1]
# Create a formatted HTML tooltip from our metadata
s_tooltip = "<br>".join(metadata.get(s_id, ["No attributes found"]))
# Add Subject Node with Tooltip
net.add_node(s_id, label=s_id, title=s_tooltip, color="#FFD700", size=25)
if isinstance(o, URIRef):
o_id = str(o).split('/')[-1]
o_tooltip = "<br>".join(metadata.get(o_id, ["Entity Link"]))
# Add Object Node (Entity)
net.add_node(o_id, label=o_id, title=o_tooltip, color="#005288", size=20)
# Add Traceability Edge
net.add_edge(s_id, o_id, label=p_label, color="#ffffff")
# 3. Enable Interaction Options
net.set_options("""
var options = {
"interaction": {
"hover": true,
"navigationButtons": true,
"tooltipDelay": 200
},
"physics": {
"stabilization": { "iterations": 100 }
}
}
""")
net.save_graph(output_html)
print(f"Interactive map generated: {output_html}")
return output_html
# Example Run
# g = load_your_graph_here()
# visualize_with_metadata(g)
How this works for Technical Traceability
HTML Tooltips: By using
<b>tags in thetitleattribute, Pyvis renders the metadata as a clean, formatted list when you hover over the node.Clean Graph: Instead of cluttering the map with 100 nodes for "Status" or "Version" text, those details stay "hidden" inside the node. The map only shows the IDs and their Connections (Traceability).
Navigation Buttons: I added
navigationButtons: trueto the options so you can pan and zoom through complex NASA hierarchies more easily using on-screen controls.
Using it with your Search Tool
You can swap the old generate_visual_trace function with this one. Now, when you search for a Program, you get the hierarchy tree, and you can hover over any Vehicle or Document to see its specific technical details immediately.
Would you like me to add a "Status Filter" where nodes change color (e.g., Red for "Delayed", Green for "Completed") based on their metadata?
Comments
Post a Comment