#3769 Useful Queries for a KG of NASA Software Hazard Reports, Causes, Controls & Verifications

#3769 For NASA Software Hazard Reports, given a system with hazard reports, causes, controls and verifications all imported into a knowledge graph for that system, what are useful queries?

Of course. For a knowledge graph built from NASA Software Hazard Reports, here are several categories of useful queries, moving from basic traceability to more complex system-wide analysis.

## Direct Traceability & Impact Analysis

These queries help you trace relationships between specific items to understand direct cause-and-effect chains and the scope of controls.

"Show the full chain for Cause X."
- Utility: This is a fundamental query. It traces a single cause to the hazard(s) it triggers, the control(s) meant to mitigate it, and how those controls are verified. It's essential for reviewing individual failure paths.
"What are all the potential hazards associated with Software Component Y?"
- Utility: This helps assess the overall risk profile of a specific part of the software. It's crucial for code reviews, testing focus, and understanding the impact of modifying that component.
"Which hazards are addressed by Control Z?"
- Utility: This is the reverse of a typical hazard query. It helps you understand if a single control is preventing multiple hazards, making it a critical control that requires extra scrutiny. It also helps in assessing the impact if that one control fails.

## Gap & Completeness Analysis

These queries are vital for ensuring the hazard analysis is thorough and that no safety-critical items have been overlooked.

"Are there any hazards with no associated controls?"
- Utility: This is arguably the most critical query. It immediately identifies high-risk areas where a known hazard has no mitigation in place, representing a major gap in the safety case.
"Show me all controls that lack a verification method."
- Utility: A control is only effective if you can prove it's implemented correctly. This query finds controls that are "on paper" but have no plan to ensure they are working, which is a key finding for any safety review board.
"List any causes that are not linked to a hazard."
- Utility: This can point to incomplete analysis. A potential cause for failure has been identified, but its ultimate effect on the system (the hazard) hasn't been documented.

## Risk Concentration & Criticality Analysis

These queries leverage the graph's structure to identify systemic risks and points of high leverage or vulnerability.

"Which cause contributes to the most hazards?"
- Utility: This identifies common-cause failures. A single root cause that can trigger multiple, seemingly unrelated hazards is a high-priority target for elimination or robust control because it represents a systemic vulnerability.
"Which control mitigates the most causes?"
- Utility: This finds your most critical or "linchpin" controls. The failure of such a control would have widespread consequences. These controls deserve the most rigorous design, implementation, and verification.
"Show me all single-point failures."
- Utility: This query would search for paths where a single cause leads to a catastrophic or critical hazard and is mitigated by only one control. This identifies the most fragile parts of your safety design, where there is no defense-in-depth.

## Verification & Validation (V&V) Status

These queries help manage the V&V process and provide a clear status of the system's safety readiness.

"What percentage of controls for System X have been successfully verified?"
- Utility: This provides a high-level, quantitative measure of progress and readiness for a system or subsystem. It's a perfect metric for status reports and milestone reviews.
"List all controls for the Artemis Gateway's HALO module whose verification method is 'Analysis' versus 'Test'."
- Utility: This helps in planning and resource allocation for the V&V team. It separates work that can be done via inspection and modeling from work that requires a hardware-in-the-loop testbed.
"Show all failed verifications and their associated hazards."
- Utility: This is an immediate action list. It shows which parts of the safety system are confirmed not to be working as intended, allowing you to prioritize debugging and corrective actions based on the severity of the associated hazard.

Given a python application connected to a Neo4j knowledge graph, how would these queries be implemented?

Of course. Here’s how you can implement those types of queries in a Python application using the neo4j driver to connect to your knowledge graph.

First, you need a basic data model. Let's assume your graph uses nodes and relationships like this:

Nodes: (:Hazard), (:Cause), (:Control), (:Verification), (:Component)
Relationships: (:Cause)-[:LEADS_TO]->(:Hazard), (:Control)-[:MITIGATES]->(:Cause), (:Verification)-[:VERIFIES]->(:Control)

## Setting Up the Connection

It's good practice to wrap your Neo4j connection and queries in a class. This manages the database driver and makes your code cleaner. You'll need to install the driver first:

pip install neo4j

Here is a simple class structure to get started:

from neo4j import GraphDatabase

class HazardGraphAnalyzer:
    """A class to connect to a Neo4j database and run hazard analysis queries."""

    def __init__(self, uri, user, password):
        """Initializes the database driver."""
        self.driver = GraphDatabase.driver(uri, auth=(user, password))

    def close(self):
        """Closes the database connection."""
        self.driver.close()

    def _execute_query(self, query, **params):
        """A helper function to run a query and return results."""
        with self.driver.session() as session:
            result = session.run(query, **params)
            # Convert the result to a list of dictionaries for easier use
            return [record.data() for record in result]

    # We will add query methods here...

## Implementing the Queries

Now, let's implement a few key queries from the categories we discussed. Each method will be added to the HazardGraphAnalyzer class.

### 1. Direct Traceability: Finding a Full Failure Chain

This query traces the path from a specific cause to its hazard, controls, and verifications.

Question: "Show the full chain for Cause X."

Cypher Query:

MATCH path = (v:Verification)-[:VERIFIES]->(c:Control)-[:MITIGATES]->(cause:Cause)-[:LEADS_TO]->(h:Hazard)
WHERE cause.id = $cause_id
RETURN h.name AS hazard, cause.name AS cause, c.name AS control, v.status AS verification_status

Python Implementation:

def get_full_chain_for_cause(self, cause_id: str):
    """Finds the hazard, control, and verification for a specific cause."""
    query = """
    MATCH (v:Verification)-[:VERIFIES]->(c:Control)-[:MITIGATES]->(cause:Cause)-[:LEADS_TO]->(h:Hazard)
    WHERE cause.id = $cause_id
    RETURN h.name AS hazard,
           cause.name AS cause,
           c.name AS control,
           v.status AS verification_status
    """
    return self._execute_query(query, cause_id=cause_id)

### 2. Gap Analysis: Finding Uncontrolled Hazards

This query is critical for finding safety gaps by identifying hazards that have no controls in place.

Question: "Are there any hazards with no associated controls?"

Cypher Query:

MATCH (h:Hazard)
WHERE NOT EXISTS ((h)<-[:LEADS_TO]-(:Cause)<-[:MITIGATES]-(:Control))
RETURN h.name AS hazard, h.severity AS severity

Python Implementation:

def find_uncontrolled_hazards(self):
    """Finds all hazards that are not mitigated by any controls."""
    query = """
    MATCH (h:Hazard)
    WHERE NOT EXISTS ((h)<-[:LEADS_TO]-(:Cause)<-[:MITIGATES]-(:Control))
    RETURN h.name AS hazard, h.severity AS severity
    """
    return self._execute_query(query)

### 3. Risk Concentration: Finding Critical Controls

This query helps identify "linchpin" controls that mitigate a large number of different causes.

Question: "Which control mitigates the most causes?"

Cypher Query:

MATCH (ctrl:Control)-[:MITIGATES]->(cause:Cause)
RETURN ctrl.name AS control, count(cause) AS mitigated_causes_count
ORDER BY mitigated_causes_count DESC
LIMIT 10

Python Implementation:

def find_critical_controls(self, limit: int = 10):
    """Finds the top N controls that mitigate the most causes."""
    query = """
    MATCH (ctrl:Control)-[:MITIGATES]->(cause:Cause)
    RETURN ctrl.name AS control, count(cause) AS mitigated_causes_count
    ORDER BY mitigated_causes_count DESC
    LIMIT $limit
    """
    return self._execute_query(query, limit=limit)

### 4. V&V Status: Finding Failed Verifications

This query creates an immediate action list by finding controls that are not working as intended and showing the hazard they were meant to prevent.

Question: "Show all failed verifications and their associated hazards."

Cypher Query:

MATCH (v:Verification {status: 'Failed'})-[:VERIFIES]->(c:Control)-[:MITIGATES]->(:Cause)-[:LEADS_TO]->(h:Hazard)
RETURN DISTINCT v.name AS verification, c.name AS control, h.name AS hazard

Python Implementation:

def get_failed_verifications(self):
    """Lists failed verifications and the hazards they ultimately affect."""
    query = """
    MATCH (v:Verification {status: 'Failed'})-[:VERIFIES]->(c:Control)-[:MITIGATES]->(:Cause)-[:LEADS_TO]->(h:Hazard)
    RETURN DISTINCT v.name AS verification, c.name AS control, h.name AS hazard
    ORDER BY hazard
    """
    return self._execute_query(query)

By wrapping these powerful Cypher queries in simple Python methods, you can easily integrate complex graph analysis directly into your software safety applications.

How might cascading failures be identified?

You can identify cascading failures by querying for specific long paths in your knowledge graph where the consequence of one hazard acts as the cause for another. This creates a chain reaction of failures.

To do this effectively, your graph model needs one addition: a relationship that links a hazard's outcome to a subsequent cause. Let's call it TRIGGERS.

Model: (:Hazard)-[:TRIGGERS]->(:Cause)
- Example: A hazard like (h:Hazard {name: 'Main Power Bus Short Circuit'}) could have a relationship -[r:TRIGGERS]-> to a cause like (c:Cause {name: 'Loss of power to cooling pumps'}).

With this relationship in place, you can find cascades by searching for paths that traverse multiple Hazard nodes.

## The Cypher Query

The key is to use Cypher's variable-length path matching. This query looks for paths that start with an initial cause and follow a chain of at least one TRIGGERS relationship, indicating a cascade.

MATCH path = (cause1:Cause)-[:LEADS_TO]->(h1:Hazard)
             -[:TRIGGERS*]->
             (cause2:Cause)-[:LEADS_TO]->(h2:Hazard)
// Optional: Filter to start from a specific component's failure modes
// MATCH (comp:Component {name: 'XPS'})-[:CAN_FAIL]->(cause1)
RETURN
  // Return the nodes in the path for analysis
  [node in nodes(path) | coalesce(node.name, node.id)] AS failure_chain,
  // Return the length to sort by severity
  length(path) AS chain_length
ORDER BY chain_length DESC
LIMIT 25

What this query does:

MATCH path = ...: It defines the pattern of the path we're looking for and assigns it to the variable path.
-[:TRIGGERS*]->: This is the crucial part. The asterisk * tells Neo4j to find paths with one or more TRIGGERS relationships, effectively finding chains of any length.
RETURN [node in nodes(path) | ... ]: This collects the names of all the nodes along the discovered path into an ordered list, showing you the exact sequence of the cascade.
ORDER BY chain_length DESC: This sorts the results to show you the longest and potentially most dangerous cascading failures first.

## Python Implementation

You would add a new method to your HazardGraphAnalyzer class to execute this query.

class HazardGraphAnalyzer:
    # ... (previous methods like __init__, close, _execute_query) ...

    def find_cascading_failures(self, limit: int = 25):
        """
        Identifies potential cascading failure paths in the knowledge graph.
        A cascade is defined as a sequence where one hazard triggers a cause
        for another hazard.
        """
        query = """
        MATCH path = (cause1:Cause)-[:LEADS_TO]->(h1:Hazard)
                     -[:TRIGGERS*]->
                     (cause2:Cause)-[:LEADS_TO]->(h2:Hazard)
        RETURN
          [node in nodes(path) | coalesce(node.name, node.id)] AS failure_chain,
          length(path) AS chain_length
        ORDER BY chain_length DESC
        LIMIT $limit
        """
        return self._execute_query(query, limit=limit)

# --- How you would use it ---
# analyzer = HazardGraphAnalyzer("bolt://localhost:7687", "neo4j", "password")
# cascades = analyzer.find_cascading_failures()
# for cascade in cascades:
#     print(f"Chain Length: {cascade['chain_length']}")
#     print(" -> ".join(cascade['failure_chain']))
#     print("-" * 20)
# analyzer.close()

By identifying these chains, you can pinpoint the best places to introduce robust controls or "firebreaks" to stop a small initial failure from escalating into a system-wide catastrophic event. 🧑‍🚒

Now, let's put this altogether. Provide software application requirements to read a user-specified CSV file for a given system with hazard reports, causes, controls and verifications, then create and submit cypher statements for entities and relationships. Then add support for the queries discussed.

Of course. Here are the software application requirements to build the hazard analysis tool we've discussed.

This document outlines the requirements for a command-line application, the Hazard Analysis Graph Tool (HAGT), designed to transform hazard report data from a CSV file into a Neo4j knowledge graph and run advanced safety analysis queries.

## 1.0 System Overview

The HAGT will be a Python-based command-line interface (CLI) tool that performs two primary functions:

Ingestion: Parses a user-specified CSV file containing hazard, cause, control, and verification data to populate a Neo4j graph database. The process will be idempotent, meaning it can be run on the same file multiple times without creating duplicate data.
Querying: Provides a simple command structure to execute the predefined safety and risk analysis queries (e.g., finding uncontrolled hazards, critical controls, and cascading failures) against the populated graph.

## 2.0 Functional Requirements

### 2.1 Data Ingestion

FR-1.1 (File Input): The application shall accept a local file path to a CSV file as a command-line argument.
FR-1.2 (CSV Parsing): The application shall parse the CSV file based on the format specified in Section 4.0.
FR-1.3 (Graph Connection): The application shall connect to a Neo4j database instance using credentials specified in an external configuration file.
FR-1.4 (Node Creation): The application shall generate and execute Cypher MERGE statements to create or update nodes for each unique Hazard, Cause, Control, and Verification entity from the CSV. Using MERGE ensures idempotency.
FR-1.5 (Relationship Creation): The application shall generate and execute Cypher MERGE statements to create the following relationships between the nodes:
- (:Cause)-[:LEADS_TO]->(:Hazard)
- (:Control)-[:MITIGATES]->(:Cause)
- (:Verification)-[:VERIFIES]->(:Control)
- (:Hazard)-[:TRIGGERS]->(:Cause) (for cascading failures)

### 2.2 Analysis & Query Interface

The application shall provide a query subcommand to execute the following analyses. Each query should output results in a human-readable, formatted table in the console.

FR-2.1 (Uncontrolled Hazards): A command shall exist to execute the "uncontrolled hazards" query, listing all hazards that are not mitigated by any control.
FR-2.2 (Critical Controls): A command shall exist to execute the "critical controls" query, listing the controls that mitigate the most causes, ordered from most to least. It should accept an optional limit (e.g., --limit 10).
FR-2.3 (Failure Chain Trace): A command shall exist to trace the full failure chain for a given Cause ID, showing the Hazard, Control, and Verification status.
FR-2.4 (Failed Verifications): A command shall exist to list all verifications with a "Failed" status and the corresponding control and hazard they affect.
FR-2.5 (Cascading Failures): A command shall exist to find and display potential cascading failure paths, ordered by the length of the chain.

## 3.0 Non-Functional Requirements

NFR-1 (Usability): The CLI shall be intuitive, with a main help command (--help) that lists all available commands and their options.
NFR-2 (Configuration): Database connection details (URI, Username, Password) shall not be hardcoded. They shall be read from an external configuration file (e.g., config.ini).
NFR-3 (Performance): The data ingestion process should use Neo4j's UNWIND Cypher clause to batch rows from the CSV into a single transaction for efficient loading.
NFR-4 (Error Handling): The application shall provide clear, user-friendly error messages for common failures, such as:
- Inability to connect to the database.
- Specified CSV file not found.
- Missing required columns in the CSV file.

## 4.0 CSV Data Format Specification 📋

The input CSV file must contain a header row with the following column names. Each row defines a single, complete path from verification to hazard.

Column Name Required Description Example
Hazard_ID Yes Unique identifier for the hazard. H-401
Hazard_Name Yes Short name or description of the hazard. Spacecraft Uncontrolled Attitude Drift
Cause_ID Yes Unique identifier for the cause. C-5532
Cause_Name Yes Short name for the cause. Star Tracker Data Timeout
Control_ID Yes Unique identifier for the control. CTL-90
Control_Name Yes Short name for the control. Switch to IMU for Attitude Reference
Verification_ID Yes Unique identifier for the verification procedure. V-IMU-SW-01
Verification_Status Yes The status of the verification. Must be one of: Passed, Failed, Pending. Passed
Triggers_Cause_ID No If this hazard's occurrence can trigger another cause, specify the target Cause ID here. Key for cascades. C-5598

Column Name	Required	Description	Example
`Hazard_ID`	Yes	Unique identifier for the hazard.	`H-401`
`Hazard_Name`	Yes	Short name or description of the hazard.	`Spacecraft Uncontrolled Attitude Drift`
`Cause_ID`	Yes	Unique identifier for the cause.	`C-5532`
`Cause_Name`	Yes	Short name for the cause.	`Star Tracker Data Timeout`
`Control_ID`	Yes	Unique identifier for the control.	`CTL-90`
`Control_Name`	Yes	Short name for the control.	`Switch to IMU for Attitude Reference`
`Verification_ID`	Yes	Unique identifier for the verification procedure.	`V-IMU-SW-01`
`Verification_Status`	Yes	The status of the verification. Must be one of: `Passed`, `Failed`, `Pending`.	`Passed`
`Triggers_Cause_ID`	No	If this hazard's occurrence can trigger another cause, specify the target Cause ID here. Key for cascades.	`C-5598`

## 5.0 User Interface (CLI) Concept

The interaction with the application would look like this from the command line:

# Ingest the data from a CSV file
python hagt.py ingest --file ./path/to/hazard_data.csv

# Run a query to find hazards with no controls
python hagt.py query uncontrolled-hazards

# Find the top 5 most critical controls
python hagt.py query critical-controls --limit 5

# Trace the full chain for a specific cause
python hagt.py query chain --cause-id "C-5532"

# Find potential cascading failures
python hagt.py query cascades

https://gemini.google.com/app/92c248e0440e8cc2

From <https://gemini.google.com/app/> Google Gemini 2.5 Pro

CREATE (charlie:Person&Actor {name: 'Charlie Sheen'}), (oliver:Person&Director {name: 'Oliver Stone'})

CREATE (charlie:Person:Actor {name: 'Charlie Sheen'})-[:ACTED_IN {role: 'Bud Fox'}]->(wallStreet:Movie {title: 'Wall Street'})<-[:DIRECTED]-(oliver:Person:Director {name: 'Oliver Stone'})

KG Text File:

ADD ENTITY, Program, Gateway, 'fullname':'Gateway Program', 'parent':'M2M', 'description':''

ADD RELATIONSHIP, Program, Gateway, IS_SUB_PROGRAM_OF, Program, M2M

CSV File Columns:

HR ID
Hazard Report File Name
Hazard Report #Name
HR
HR Author
Revision

Cause #
Cause text

Control #
Control Text

Verification #
Verification Text

System

Hazard Report(s)

Cause(s)

Control(s)

Verification(s)

Search This Blog

Learning from ChatGPT