#4355 Solver for Formal Requirements Compliance Checking
#4355 Solver for Formal Requirements Compliance Checking
#4355 Give tools like SAT and SMT Solvers, could a similar purpose tool be developed to "prove" a software planning and/or development document complies with a set of formal requirements?
The short answer is yes, and in the world of high-stakes engineering—like NASA’s Safety and Mission Assurance (S&MA)—this is often referred to as Formal Specification and Automated Reasoning.
While SAT (Satisfiability) and SMT (Satisfiability Modulo Theories) solvers are traditionally used to verify code or circuit logic, they are increasingly being applied to the "left side" of the V-model: the requirements and design phase.
Here is how a tool for "proving" document compliance would function, the logic required, and the hurdles involved.
1. The Logic: Transforming Text into Predicates
To "prove" compliance, you cannot work with raw English. You must translate the Formal Requirements (the "Rules") and the Development Document (the "Artifact") into a common mathematical language, typically First-Order Logic (FOL).
The Requirements (R): Expressed as constraints.
Example: forall x (IsSafetyCritical(x) implies HasRedundancy(x))
The Document (D): Expressed as assertions or facts.
Example: IsSafetyCritical(PropulsionSystem) land neg HasRedundancy(PropulsionSystem)
An SMT solver then checks if D land neg R is unsatisfiable. If it is unsatisfiable, it means there is no scenario where the document exists without violating the requirements—essentially proving compliance.
2. Structural Compliance vs. Semantic Compliance
A tool like this generally operates on two levels:
A. Structural Verification (The "Easy" Part)
This uses SAT/SMT to ensure the document architecture meets the standard's meta-requirements (like those in NASA-STD-8739.8B).
Completeness: Does every Requirement ID in the SRS have a corresponding test case ID in the V&V plan?
Traceability: Can the solver find a valid path from a low-level software unit back to a high-level mission objective?
B. Semantic Verification (The "Hard" Part)
This proves that the intent of the design doesn't violate a safety constraint.
Example: If a requirement states "No two valves shall open simultaneously," an SMT solver can analyze the state-machine logic described in your development document to see if any combination of inputs allows that "Forbidden State" to be reached.
3. Existing Tooling and Frameworks
If you were to build this today, you wouldn't start from scratch. You would likely leverage these "Formal Methods" ecosystems:
TLA+ (Temporal Logic of Actions): Created by Leslie Lamport, used heavily by Amazon and Microsoft to prove that system designs (documents/specs) are logically sound before a single line of code is written.
Alloy: A lightweight modeling language based on first-order relational logic. It’s excellent for "document-level" proofing because it excels at finding counter-examples in structures and hierarchies.
Coq or Lean: These are "Theorem Provers." They are more manual than SMT solvers but allow you to construct a rigorous mathematical proof that a design fulfills a specification.
4. The "Agentic" Bridge
The missing link in the past was the "Translation Gap"—the effort required to turn a PDF into formal logic was too high. This is where your interest in Agentic AI and Knowledge Graphs comes in.
Extraction: An LLM agent (like your "Sentinel Archivist") extracts entities and relationships from the NASA standards.
Formalization: The agent translates these into SMT-LIB format (the standard language for SMT solvers like Z3).
Verification: The SMT solver runs the proof.
Reporting: If the proof fails, the solver provides a Counter-example, which the AI translates back into human terms: "Requirement 4.3.1 is violated because the Propulsion System lacks a secondary power rail in your design."
The Reality Check
The biggest hurdle isn't the solver—Z3 and cvc5 are incredibly powerful. The hurdle is ambiguity. Natural language in development documents often contains "soft" terms that are hard to atomize. Formalizing a document requires a very disciplined "Contract-Based Design" approach.
Do you see this tool primarily as a "Gatekeeper" for automated pull requests, or as a "Design Assistant" that flags logic flaws while a dev is still drafting the architecture?
First priority is for NASA to review and approve launch service provider software planning and development plans, as well as software hazard reports. A secondary priority is to provide this as a tool for document authors.
Given the priority of verifying Launch Service Provider (LSP) plans and Software Hazard Reports, you are looking at a system that must move beyond simple keyword checking into Automated Regulatory Reasoning.
For NASA’s S&MA role, this tool acts as a "Formal Auditor." By leveraging SMT solvers, you can transform the review process from a subjective human read into a rigorous mathematical check.
1. Primary Mission: The "Formal Auditor" for NASA
In the context of NASA S&MA and SQA, the tool's goal is to detect Non-Compliance and Safety Gaps in LSP submissions.
Verification of Software Hazard Reports
Hazard reports are essentially logic trees. An SMT-based tool can verify the Cut Sets of a hazard:
The Logic: If the LSP claims "Redundancy A and Mitigation B prevent Hazard X," the SMT solver can be used to find a state where (neg A land neg B) implies X is false, or more importantly, if there is a Common Cause Failure the author missed.
Formalizing "Inhibits": You can model launch safety inhibits (e.g., three independent inhibits for a critical function) as boolean constraints. The tool "proves" whether the software architecture described in the plan actually maintains those inhibits across all operational states.
Reviewing Development Plans (SDP/SMP)
Using Constraint Logic Programming (CLP), you can "prove" the plan adheres to NASA-STD-8739.8B:
The Constraint: "All Safety Critical (Class A) components must undergo MC/DC coverage."
The Proof: The tool maps the LSP’s component list against their proposed testing levels. If a component is flagged as "Hazardous" in the Hazard Report but "Class C" in the SDP, the solver triggers a Logical Contradiction.
2. Secondary Mission: The "Author's Assistant"
For the document authors, the tool acts as a Linter for Requirements. Just as a compiler catches syntax errors, this tool catches Requirement Collisions.
Conflict Detection: If an author adds a requirement that contradicts an existing safety constraint, the SMT solver identifies it immediately.
Satisfiability: It ensures the set of requirements is actually doable. (e.g., if Requirement A requires "Zero Latency" and Requirement B requires "Encryption," and the hardware constraints make both impossible, the solver proves the plan is "Unsatisfiable").
3. The Proposed Architecture
To make this work within your current "MAKER" and "Sentinel Archivist" workflow, you would need a three-tier stack:
| Tier | Technology | Purpose |
| Ingestion | Agentic LLM / GraphRAG | Extract "Shall" statements and Hazard logic from PDFs/Transcripts into a Knowledge Graph. |
| Translation | Formalizer Agent | Convert Knowledge Graph relationships into SMT-LIB or Alloy predicates. |
| Reasoning | Z3 / Bitwuzla Solver | Execute the "Check." Find contradictions or unmapped hazards. |
4. Why SMT is better than "Standard" AI for this
While a standard LLM can "summarize" a NASA standard, it cannot guarantee correctness. It can hallucinate that a requirement is met when it isn't.
An SMT solver is sound:
If the solver says "Unsatisfiable," it is a mathematical certainty that the document violates the rules.
For NASA S&MA, this provides a Defensible Audit Trail. You aren't saying "the AI thinks this is wrong"; you are saying "The logic is provably inconsistent with NASA-STD-8739.8B."
5. Potential Challenge: State Space Explosion
Software Hazard Reports for launch vehicles can become incredibly complex. The challenge is "modeling" the document at the right level of abstraction so the solver doesn't time out.
Since you are already working with Neo4j and Cypher, are you envisioning the Knowledge Graph as the "source of truth" that then exports these logic fragments to the SMT solver for verification?
Given NASA NPR 7150.2 and a launch provider software development plan, describe the workflow to implement this solver variation
Implementing a solver-based verification workflow for NPR 7150.2 compliance requires moving beyond simple pattern matching. You are essentially building a Regulatory Reasoning Engine that treats the NASA Procedural Requirements (NPR) as a formal specification and the Provider’s Software Development Plan (SDP) as a model to be verified.
Since you are already using Neo4j/Cypher for your Digital Mission Twin, the workflow integrates the graph as the "fact base" and the SMT solver as the "inference engine."
The 4-Stage Workflow: "NPR-to-Solver"
Stage 1: Formalizing the "Rulebook" (NPR 7150.2)
You must transform the text of NPR 7150.2 into a Formal Policy Graph.
Decomposition: Use an agent to parse NPR 7150.2 (specifically Appendix C: Requirements Mapping Matrix).
Node Creation: Represent each requirement as a node with properties for Software Class (A, B, C, etc.) and Compliance Level.
Logical Predicates: Instead of just text, store the "logic" of the requirement.
NPR Requirement: "Class A software requires unit testing with MC/DC coverage."
Predicate:
(SoftwareClass == 'A') -> (HasTestMethod('MCDC'))
Stage 2: Ingesting the Provider "Model" (The SDP)
The Launch Provider’s SDP is ingested into a separate subgraph.
Entity Extraction: Use your Sentinel Archivist agents to identify the Provider's claimed processes, tools, and roles.
Mapping to the Twin: Connect these entities to the specific NASA Software Classes they are intended to support.
Example: The Provider claims "We use PyTest for all modules." This is mapped as
(Provider)-[IMPLEMENTS]->(UnitTesting)and(UnitTesting)-[USES]->(PyTest).
Stage 3: The Translation Layer (Graph to SMT-LIB)
This is the "Solver Variation" step. A script traverses the Neo4j graph and exports the conflicting "claims" into a First-Order Logic format (like SMT-LIB).
Select a Scope: Focus on a specific section (e.g., Software Configuration Management or Verification).
Generate Assertions:
ASSERT NPR_Rule_7150_2_Section_X(The Rule)ASSERT Provider_Plan_Claim_Y(The Evidence)
The Counter-Model Query: Tell the solver: "Find an assignment of variables where the Provider Plan exists but the NPR Requirement is FALSE."
Stage 4: Automated S&MA Review & Feedback
The output of the solver is either Satisfiable (Compliant) or Unsatisfiable (Non-Compliant).
Conflict Pinpointing: If the solver finds a contradiction, it returns a Counter-example.
Natural Language Generation: Your agent takes that raw logic (e.g.,
Class_A_Unit_Test_Incomplete) and maps it back to the SDP: "In Section 4.2 of the SDP, you designate the Guidance Controller as Class A, but your Test Plan in Section 5.1 only specifies Statement Coverage, violating NPR 7150.2 Section 3.2.1."
Comparison of Tool Roles
| Feature | S&MA Auditor Role (NASA) | Author's Assistant Role (Provider) |
| Input | Multiple SDPs vs. NPR 7150.2 | Single SDP draft vs. NPR 7150.2 |
| Objective | Find "Critical Gaps" and "Non-Compliance" | Find "Internal Inconsistencies" |
| Solver Focus | Exhaustive Search: Proving no safety paths are missed. | Consistency Check: Ensuring Requirement A doesn't break Requirement B. |
| Output | Formal Approval/Rejection Report | Real-time "Linter" warnings in the IDE/Editor. |
Technical Implementation Notes for your "MAKER" Stack
The "Bridge": Use a Python library like Z3-Py. It allows you to write the logic in Python while the heavy lifting happens in the Z3 C++ core.
Knowledge Graph Hook: You can use Cypher queries to pull "Logic Fragments" out of Neo4j. For instance, query all nodes where
SoftwareClass = 'A'and pass their linked verification methods to the solver.The "Hazard" Variation: For Software Hazard Reports, the solver can check transitive relationships. If Hazard 1 is mitigated by Software 1, and Software 1 depends on Library X, the solver can prove if Library X's failure is a "hidden" cause for Hazard 1.
Do you want to focus first on the Cypher-to-SMT translation logic, or on the Agentic extraction of "Shall" statements from the NPR PDFs?
Convert the following NPR 7150.2 requirements """4.5.11 The project manager shall plan and conduct software regression testing to demonstrate that
defects have not been introduced into previously integrated or tested software and have not produced a security vulnerability. [SWE-191] """ into a logical predicate
To convert a natural language requirement like SWE-191 into a logical predicate, we must define the entities (Variables) and the conditions (Functions/Predicates) that must be true for the requirement to be "Satisfied."
In an SMT-based system, we treat this as a constraint that must hold true for every software change or release.
1. Defining the Predicates
We can decompose the "Shall" statement into the following components:
Plan(p, t): Project manager p has a documented plan for testing t.
Conduct(p, t): Project manager p executes testing t.
RegressionTest(t, s): Test t is a regression suite for software s.
IntroducesDefect(s', s): The change from version s to s' introduces a bug.
IntroducesVulnerability(s', s): The change introduces a security flaw.
Demonstrates(t, \neg F): Test t proves that fact F is false.
2. The Formal Logical Predicate
For a software project to be compliant with SWE-191, the following must be true:
3. SMT-LIB Implementation (Z3 Friendly)
If you were to feed this into an SMT solver like Z3 to check an LSP's development plan, you would define it as a series of assertions. The solver would then look for a "Model" where these conditions are violated.
; Define Sorts (Types)
(declare-sort Software)
(declare-sort TestPlan)
; Define Predicates
(declare-fun is_regression_test (TestPlan Software) Bool)
(declare-fun covers_defects (TestPlan) Bool)
(declare-fun covers_security (TestPlan) Bool)
(declare-fun is_planned (TestPlan) Bool)
(declare-fun is_conducted (TestPlan) Bool)
; SWE-191 Constraint:
; If it is a regression test, it MUST be planned, conducted,
; and cover both defects and security vulnerabilities.
(assert (forall ((t TestPlan) (s Software))
(=> (is_regression_test t s)
(and (is_planned t)
(is_conducted t)
(covers_defects t)
(covers_security t)))))
4. Integration with your Digital Mission Twin
In your Neo4j environment, this predicate acts as a "Validation Rule."
The Check: If your Knowledge Graph contains a
Softwarenode linked to aTestPlan, but theTestPlannode lacks a:SECURITY_SCANor:REGRESSION_SUITElabel, the SMT solver will return Unsatisfiable.The Gap Analysis: The "Counter-example" provided by the solver would point directly to the missing relationship in your graph, effectively saying: "You have a test plan for the flight software, but you haven't asserted that it checks for security vulnerabilities as required by SWE-191."
This moves the review from "Did I read the document?" to "Does the graph representation of the project satisfy the mathematical constraints of the NPR?"
Which and how would "2. The Formal Logical Predicate" or "3. SMT-LIB Implementation (Z3 Friendly)" be stored in the knowledge graph?
In your Neo4j architecture, you shouldn't choose between the Predicate and the SMT-LIB code—you should store both, but in different roles within the node properties.
To maintain the "Digital Mission Twin" approach you’ve been developing, the Knowledge Graph serves as the Logic Library. Here is how to structure this:
1. The Schema: The Requirement Node
You would extend your existing Requirement (or SWE_Requirement) nodes to include specific properties for formal reasoning.
Node Label: :Requirement
Properties:
uid: "SWE-191"text: "The project manager shall plan and conduct software regression testing..."formal_logic(The Predicate): Stores the First-Order Logic (FOL) string. This is for human/agent readability and cross-tool translation (e.g., if you switch from Z3 to Alloy).solver_implementation(The SMT-LIB): Stores the raw SMT-LIB s-expression. This is for direct execution by the solver.
2. How to Store It (Cypher Example)
You can use a LogicConstraint label or simply fold it into the requirement itself. Given your focus on the Sentinel Archivist, storing the solver code directly allows an agent to "grab and go."
MERGE (r:Requirement {uid: 'SWE-191'})
SET r.formal_logic = "forall(t, s): (is_reg(t, s) && planned(t)) => (no_defects(t) && no_vuln(t))",
r.smt_lib = "(assert (forall ((t TestPlan) (s Software)) (=> (is_reg t s) (and (planned t) (covers_defects t) (covers_security t)))))",
r.version = "NPR 7150.2D"
3. The "How" and "Why" for Each Format
| Format | Where it sits in the Graph | Why it’s stored this way |
| Formal Predicate (FOL) | Property: formal_logic | Universal Translation: Your "Idea Miner" agents can parse this simplified string to explain the rule to a human author or map it to a different MBSE tool like SysML v2. |
| SMT-LIB Implementation | Property: smt_lib | Execution Speed: When NASA runs a "Compliance Audit," the Python controller pulls all smt_lib properties in one query and pipes them directly into the Z3 kernel without needing an LLM to re-translate them. |
4. Managing the "State" with Relationships
The SMT code alone is just a template. To "prove" compliance, the graph must link the Rule to the Provider's Evidence.
Relationship:
(r:Requirement {uid:'SWE-191'})-[:VERIFIED_BY]->(p:TestPlan {name:'LSP_Alpha_Regress_Plan'})The Workflow:
The system identifies that
SWE-191is a "Class A" requirement.It looks for a
:VERIFIED_BYlink to aTestPlan.If the link exists, it pulls the
smt_libfrom the Requirement node.It pulls the Metadata (properties like
is_planned: true,security_scans: false) from the Provider'sTestPlannode.It injects the Provider's data as Constants into the SMT script and runs the check.
Why this fits your "MAKER" approach:
By storing the SMT-LIB directly in the graph, you turn your Neo4j instance into a Verified Knowledge Base. You aren't just storing "What" the requirement is; you are storing the Machine-Executable Proof of how to check it.
Are you planning to have your Sentinel Archivist agents generate these SMT-LIB strings automatically during the ingestion of the NPR PDFs?
Provide the step-by-step workflow to read NPR 7150.2 and write the requirement uid, text, formal_logic (generated by an LLM) and the solver_implementation (generated by an LLM).
This workflow leverages your Sentinel Archivist agentic framework to transform the static NPR 7150.2 document into a "live" reasoning engine within Neo4j.
The "Logic Ingestion" Pipeline
Step 1: Pre-Processing and Segmentation
Because LLMs lose precision over long documents, you must break the NPR into digestible "Requirement Objects."
Extract: Use a PDF parser (like
PyMuPDF) to target Appendix C (The Requirements Mapping Matrix).Chunk: Isolate each requirement by its SWE-ID.
Contextualize: For each chunk, attach the Software Class (A, B, C, D) and the Requirement Title to provide the LLM with the necessary scope.
Step 2: The "Formalizer" Agent Prompting
You will use a two-shot prompting strategy to ensure the formal_logic and solver_implementation are mathematically consistent.
Prompt Template:
"Analyze NASA Requirement {uid}: '{text}'.
Translate this into a First-Order Logic predicate using standard notation.
Implement this logic as a Z3 SMT-LIB assertion.
Use a consistent vocabulary:
TestPlan,Software,is_planned(t),covers_security(t).Output in JSON format with keys:
formal_logic,smt_lib."
Step 3: Consistency Validation (The "Self-Correction" Loop)
Before writing to the graph, run a "Dummy Check" to ensure the generated SMT-LIB is syntactically correct.
The agent passes the
smt_libstring to a local Z3-Py instance.If Z3 throws a
ParserException(e.g., mismatched parentheses), the agent is prompted to fix the syntax.Only valid SMT-LIB code proceeds to the database.
Step 4: Knowledge Graph Ingestion (The Write)
Using the Python Neo4j driver, execute a MERGE query to populate your "Logic Library."
def ingest_requirement(tx, req_data):
query = """
MERGE (r:Requirement {uid: $uid})
SET r.text = $text,
r.formal_logic = $formal_logic,
r.solver_implementation = $smt_lib,
r.source = 'NPR 7150.2D',
r.last_updated = datetime()
"""
tx.run(query,
uid=req_data['uid'],
text=req_data['text'],
formal_logic=req_data['formal_logic'],
smt_lib=req_data['smt_lib'])
The End-to-End Operational Workflow
| Step | Action | Tool / Agent | Output |
| 1 | Scan PDF | Sentinel_Reader | Raw text of SWE-191, SWE-125, etc. |
| 2 | Logic Synthesis | Formalizer_Agent | JSON with Predicate + SMT-LIB code. |
| 3 | Syntax Check | Local Z3 Solver | Pass/Fail validation of the SMT code. |
| 4 | Graph Commit | Neo4j_Driver | A populated :Requirement node in Alvin-local DB. |
| 5 | Mapping | Traceability_Agent | (Requirement)-[:APPLIES_TO]->(SoftwareClass {name: 'A'}) |
Example Node Result (Neo4j)
After the workflow runs, your query MATCH (r:Requirement {uid: 'SWE-191'}) RETURN r would yield:
uid:
"SWE-191"text:
"The project manager shall plan and conduct software regression testing..."formal_logic:
"forall(t, s) : (RegressionTest(t, s) ∧ Planned(t)) ⇒ (Verified(t, 'NoDefects') ∧ Verified(t, 'NoVuln'))"solver_implementation:
"(assert (forall ((t TestPlan) (s Software)) (=> (and (is_reg_test t s) (is_planned t)) (and (checks_defects t) (checks_security t)))))"
Why this works for your "NASA Reviewer" use case:
When a Launch Provider submits their plan, you ingest their plan as facts (e.g., is_planned(LSP_Plan_v1) = true). You then pull the solver_implementation from your node, combine it with their facts, and let Z3 do the "proving."
If Z3 says unsat, the provider has a non-compliance issue that is mathematically undeniable.
Would you like the specific System Prompt for the "Formalizer" Agent to ensure it stays within the Z3-standard vocabulary?
How would this be applied to a launch provider software development plan?
Applying this to a Launch Service Provider (LSP) Software Development Plan (SDP) is where the "Formal Auditor" comes to life. Since an SDP is a descriptive document (prose), the workflow shifts from storing rules to verifying evidence against those rules.
Here is the step-by-step application workflow.
1. Document Shredding (Evidence Extraction)
The LSP's SDP is ingested by your Sentinel Archivist. The goal is to turn their claims into "Facts" (Assertions) that the solver can read.
Process: The agent reads the SDP and looks for commitments that map to your defined predicates (e.g.,
is_planned,covers_security).Result: A JSON object of the provider's claims.
Text: "We perform regression testing for every release using our SecuCheck tool."
Assertion:
(is_planned LSP_Reg_Test),(covers_security LSP_Reg_Test).
2. Dynamic Assertion Generation
You don't just run the SMT code as a static string. Your Python controller creates a "Problem Instance" by combining the NPR Rule (from your graph) with the LSP Evidence.
The Python Logic:
# 1. Get the NPR Requirement SMT template from Neo4j
npr_rule = requirement_node.get('solver_implementation')
# 2. Get the Provider Claims from the SDP analysis
provider_assertions = [
"(declare-const LSP_Reg_Test TestPlan)",
"(assert (is_reg_test LSP_Reg_Test Flight_Software_v1))",
"(assert (is_planned LSP_Reg_Test))",
"(assert (not (checks_security LSP_Reg_Test)))" # Agent found no mention of security
]
# 3. Combine them into a single SMT problem
full_problem = "\n".join(provider_assertions) + "\n" + npr_rule + "\n(check-sat)\n(get-model)"
3. The Solver Execution (The Audit)
The combined script is sent to the Z3 Solver.
Scenario A (Compliant): If the solver returns
sat, the provider's claims successfully satisfy the logical constraints of the NPR.Scenario B (Non-Compliant): If the solver returns
unsat, it means the provider's plan mathematically contradicts the NPR.
4. Generating the "Finding"
If the result is unsat, you use the solver's Counter-example to generate a specific audit finding for the NASA review board.
| Step | Data Source | Content |
| The Rule | Requirement.uid | SWE-191: Regression testing must include security vulnerability checks. |
| The Gap | Z3 Counter-model | The solver found that checks_security is false for LSP_Reg_Test. |
| The Evidence | SDP Section 5.4 | "Provider mentions functional regression but lacks any reference to security scanners or vulnerability analysis." |
The Integrated "Mission Twin" Workflow
Ingest SDP: The Sentinel Archivist maps the SDP into a Provider Subgraph in Neo4j.
Logic Link: Use Cypher to link the Provider's
:TestActivityto the NASA:Requirementit claims to satisfy.Prove: A Python script triggers:
"For every link
(Activity)-[:SATISFIES]->(Requirement), pull thesolver_implementationand verify theActivityproperties."
Visualize: Highlight nodes in your Neo4j browser:
Green: Provably compliant.
Red: Provably non-compliant (Unsat).
Yellow: Insufficient data (Solver cannot reach a conclusion).
Why this is a "MAKER" Win
For a NASA subcontractor, this replaces weeks of manual "Compliance Matrix" checking with a deterministic engine. You aren't just giving an opinion on the SDP; you are providing a formal proof of omission.
Since you’re using Grace Blackwell and Ryzen AI Max+ hardware, you can run these SMT permutations locally and near-instantaneously, even for thousands of requirements, without ever sending sensitive LSP data to a cloud API.
Does this "Evidence vs. Rule" approach align with how you currently handle the Requirements Mapping Matrix for NASA-STD-8739.8B?
From <https://gemini.google.com/app/> Google Gemini 3.0 (Fast)
BITWUZLA
Bitwuzla is a Satisfiability Modulo Theories (SMT) solver for the theories of fixed-size bit-vectors, floating-point arithmetic, arrays, uninterpreted functions and their combinations.
Comments
Post a Comment