#4432 Fine-Tuning a Small Model with NPR 7150.2D and NASA-STD-8739.8B

#4432 What would training data look like to fine-tune a small model with NASA NPR 7150.2D?

To fine-tune a small, local Large Language Model (like an IBM Granite or Nemotron variant) to understand, audit, or trace requirements against NASA NPR 7150.2D (NASA Software Engineering Requirements), your training data needs to be structured, deterministic, and highly specific.

Because small models have limited parameter capacity, they struggle with raw, open-ended text ingestion. Instead of feeding them the raw PDF, you must transform the document into instruction-response pairs or context-constrained completion pairs that mimic the exact tasks you want the model to perform.

Here is a breakdown of what that training data looks like, how to structure it, and concrete examples.

1. The Core Schema: Supervised Fine-Tuning (SFT)

For a small model to assist with compliance auditing or Requirements Mapping Matrices (RMM), the data should follow an instruction-tuning format (such as Alpaca or ChatML).

Every training sample must isolate one specific software requirement (SWE), its classification applicability (Class A through H), and the expected verification engineering rationale.

The Structural Blueprint

System Prompt: Establishes the persona (e.g., "You are an expert NASA Software Assurance Engineer auditing NPR 7150.2D compliance...").
Instruction/Input: The specific task or the candidate software requirement text being evaluated.
Context: The exact text or reference from NPR 7150.2D (crucial for small models to prevent hallucination).
Response (Target): The ground-truth gold standard answer showing compliance status, gaps, or rationale.

2. Concrete Training Data Examples

Depending on how you intend to use the model, your training dataset will look like a JSONL (JSON Lines) file containing a mix of the following three task types:

Task Type A: Compliance Auditing & Gap Analysis

This trains the model to look at an engineering artifact and find gaps against a specific SWE.

{
  "messages": [
    {
      "role": "system",
      "content": "You are a NASA Software Safety and Mission Assurance auditor. Analyze project artifacts strictly against NPR 7150.2D requirements."
    },
    {
      "role": "user",
      "content": "Evaluate this project statement against NPR 7150.2D SWE-139:\n'Project Orion-X will track safety-critical software defects in an Excel spreadsheet managed by the lead developer.'\n\nContext for SWE-139: The project shall implement a software configuration management system that uniquely identifies, controls, and tracks changes to software configuration items, including software defects."
    },
    {
      "role": "assistant",
      "content": "Status: NON-COMPLIANT\n\nRationale: SWE-139 requires a software configuration management (SCM) system that uniquely identifies, controls, and tracks changes to configuration items. An Excel spreadsheet managed by a single developer lacks institutional configuration control, automated version tracking, multi-user concurrency control, and authorization workflows required for a Class A-C software project. \n\nCorrective Action: Migrate defect tracking to an enterprise-grade, configuration-controlled tool (e.g., Jira, Azure DevOps, or an internal NASA-approved SCM tool) with defined access controls and automated audit trails."
    }
  ]
}

Task Type B: Classification & Applicability Mapping

NPR 7150.2D relies heavily on the Software Class (A, B, C, etc.) to determine if a requirement is Mandatory (X), Blank ( ), or Requires a Deviation/Waiver. Small models must be explicitly trained on this matrix logic.

{
  "messages": [
    {
      "role": "system",
      "content": "You are an expert in NASA NPR 7150.2D Software Classifications and Appendix C (Requirements Mapping Matrix)."
    },
    {
      "role": "user",
      "content": "Is SWE-022 ('The project shall document the software development life cycle') required for a Class C human-spaceflight-associated software project?"
    },
    {
      "role": "assistant",
      "content": "Yes. According to NPR 7150.2D Appendix C (Requirements Mapping Matrix), SWE-022 is a mandatory requirement (designated with an 'X') for Software Class C. The project must document its software development life cycle within its Software Development Plan (SDP)."
    }
  ]
}

Task Type C: Bidirectional Traceability Extraction

This trains the model to verify that higher-level system requirements map properly to lower-level software requirements (SWE-052 / SWE-064).

{
  "messages": [
    {
      "role": "system",
      "content": "You are a software process engineer validating bidirectional requirements traceability."
    },
    {
      "role": "user",
      "content": "Verify if bidirectional traceability is maintained given the following data:\nSystem Requirement: SYS-REQ-101: The Guidance system must shut down within 50ms of an abort signal.\nSoftware Requirement: SW-REQ-402: The abort interrupt handler shall trigger a safe shutdown sequence."
    },
    {
      "role": "assistant",
      "content": "Status: PARTIAL COMPLIANCE / GAP IDENTIFIED\n\nRationale: While SW-REQ-402 traces forward to the intent of SYS-REQ-101 (handling the abort), it fails to specify the strict 50ms timing constraint dictated by the system requirement. \n\nCorrective Action: Update SW-REQ-402 to state: 'The abort interrupt handler shall trigger a safe shutdown sequence execution to complete within 50ms of signal receipt.' to ensure complete vertical verification traceability per SWE-052."
    }
  ]
}

3. Data Prep Strategy for Small Models

If you are fine-tuning a model under 8B or 14B parameters to run locally, keep these constraints in mind when generating your training dataset:

Deconstruct the PDF: Do not just chunk the text by token count. Parse the document by requirement number. Create a master structured data file (JSON or CSV) where every row is an individual SWE (from SWE-001 to the end of the document), along with its sub-paragraphs.
Synthetic Data Generation (Self-Instruct): Use a larger frontier model (like Claude 3.5 Sonnet or GPT-4o) via API to generate the training pairs. Feed the large model a single SWE and its NPR 7150.2D context, and instruct it to generate 5 variations of compliant engineering text, 5 variations of non-compliant text, and the corresponding expert rationales.
Negative Sampling: Ensure at least 30% of your training data includes flawed engineering inputs (missing traceability, wrong software classification, lack of independent verification). Small models over-optimize quickly; if you only show them compliant examples, they will flag everything as "compliant."
Incorporate the "Why": Never let a training response simply be "Compliant" or "Non-Compliant." Force the target text to explicitly quote the phrasing of NPR 7150.2D. This anchors the small model's weights to the exact terminology of the standard (e.g., "shall establish," "shall maintain," "shall document").

4. Size of the Dataset

For a highly domain-specific, rule-bound task like NPR 7150.2D auditing:

Target: 1,500 to 3,000 high-quality instruction pairs.
Because the scope of 7150.2D is tightly bounded (roughly 130+ distinct SWEs plus appendices), generating ~15 to 20 varied scenario-based prompt/response pairs per SWE will give a small model sufficient density to learn the underlying compliance logic without suffering from catastrophic forgetting.

Provide step-by-step instructions on how to ingest the NPR 71502D.pdf into a Python app, then generate training data from the text.

Ingesting a dense compliance document like NASA NPR 7150.2D and converting it into structured fine-tuning data requires transitioning from raw, unformatted text to deterministic, anchor-based programmatic extraction.

Because NPR 7150.2D relies on highly predictable tags (e.g., [SWE-001], [SWE-013]), you can bypass fuzzy semantic chunking entirely. Instead, use a deterministic regex extraction pipeline followed by an automated LLM-in-the-loop synthetic generation framework.

Step 1: Set Up Your Python Environment

Install the necessary libraries for PDF parsing, text structuring, and LLM orchestration.

pip install pymupdf pandas openai pydantic

Note: PyMuPDF (imported as fitz) is preferred over generic text extractors because it handles multi-column tables, line breaks, and whitespace encoding exceptionally well without breaking the structure of NASA's standard formatting.

Step 2: Extract and Clean the PDF Text Programmatically

The main challenge with NPR 7150.2D is that lines wrap arbitrarily in a PDF, splitting single requirements across multiple lines. This script opens the PDF, extracts the raw text, and joins broken lines to prepare for pattern matching.

import fitz  # PyMuPDF
import re

def extract_clean_text(pdf_path):
    doc = fitz.open(pdf_path)
    full_text = []
    
    for page in doc:
        # Extract text preserving layout blocks to prevent paragraph overlapping
        text = page.get_text("blocks")
        # Sort blocks top-to-bottom, left-to-right
        text.sort(key=lambda b: (b[1], b[0]))
        
        for block in text:
            block_text = block[4].strip()
            if block_text:
                full_text.append(block_text)
                
    # Join and normalize line breaks within continuous text blocks
    combined_text = "\n\n".join(full_text)
    # Replace single newlines within paragraphs but preserve double newlines
    normalized_text = re.sub(r'(?<!\n)\n(?!\n)', ' ', combined_text)
    return normalized_text

raw_doc_text = extract_clean_text("NPR_7150_2D.pdf")

Step 3: Parse Text Deterministically into Software Requirements (SWEs)

NASA documents anchor mandatory rules with distinct identifiers like [SWE-051]. Use a regular expression to split the document by these identifiers, isolating the unique requirement number, its explicit text instruction, and the parent context or sub-clauses (e.g., a., b., c.).

import json

def parse_swe_requirements(text):
    # Regex matches the exact structure of a SWE tag followed by its requirement text
    # up until the next SWE or major structural block break.
    swe_pattern = r'(\[\s*SWE-\d{3}\s*\])'
    
    # Split text into segments around the tokens
    segments = re.split(swe_pattern, text)
    
    parsed_swes = []
    
    # The first element is preamble text, skip it. Alternate through tokens and texts.
    for i in range(1, len(segments), 2):
        swe_tag = segments[i].strip().replace(" ", "")  # Normalize to [SWE-XXX]
        swe_body = segments[i+1].strip() if (i+1) < len(segments) else ""
        
        # Pull out any structural bullet sub-requirements (e.g., a. Validation, b. Configuration)
        # to ensure the full requirement context stays unified
        parsed_swes.append({
            "id": swe_tag,
            "requirement_text": swe_body
        })
        
    return parsed_swes

extracted_requirements = parse_swe_requirements(raw_doc_text)
print(f"Successfully extracted {len(extracted_requirements)} individual NASA SWE nodes.")

Step 4: Define the Target Fine-Tuning Schema using Pydantic

To feed a small model, you want to generate high-quality Instruction-Response pairs. Using a frontier model via API with Pydantic ensures the synthetically generated training data conforms to a strict, parseable schema without format breaking.

from pydantic import BaseModel, Field
from typing import List

class SFTTrainingSample(BaseModel):
    user_prompt: str = Field(description="The engineering user prompt asking for compliance verification or gap analysis.")
    assistant_response: str = Field(description="The gold-standard expert NASA S&MA engineer compliance ruling, quoting the SWE text and giving rationale.")

class SyntheticBatch(BaseModel):
    samples: List[SFTTrainingSample]

Step 5: Run the Synthetic SFT Generation Loop

This step targets an external frontier LLM (like GPT-4o or Claude 3.5 Sonnet) via Structured Outputs to generate scenario permutations for each requirement. For every SWE, instruct it to generate distinct compliance challenges (e.g., one compliant example, one flawed example).

from openai import OpenAI

client = OpenAI(api_key="your_api_key_here")

def generate_training_pairs(swe_node):
    prompt_instruction = f"""
    You are an expert NASA Software Safety and Mission Assurance auditor.
    Using the following specific NASA NPR 7150.2D requirement node, generate exactly TWO distinct training variations for a Supervised Fine-Tuning (SFT) dataset:
    
    Requirement: {swe_node['id']}
    Text Context: {swe_node['requirement_text']}
    
    Variation 1: An engineering scenario that is fully COMPLIANT. The assistant response must evaluate the text and explain WHY it passes.
    Variation 2: An engineering scenario that is NON-COMPLIANT (has a compliance gap). The assistant response must flag the deficiency and outline the exact corrective engineering action.
    
    Ensure responses explicitly reference the requirement number and its specific phrasing.
    """
    
    response = client.beta.chat.completions.parse(
        model="gpt-4o",  # Or an equivalent reasoning-capable frontier API
        messages=[
            {"role": "system", "content": "You are a data compilation engine generating structured JSON datasets for machine learning fine-tuning."},
            {"role": "user", "content": prompt_instruction}
        ],
        response_format=SyntheticBatch,
    )
    
    return response.choices[0].message.audio or response.choices[0].message.parsed

# Process nodes and stream to fine-tuning payload
sft_dataset = []
for node in extracted_requirements[:5]: # Testing on first 5 nodes before full run
    batch = generate_training_pairs(node)
    for sample in batch.samples:
        sft_dataset.append(sample)

Step 6: Format and Export to ChatML JSON Lines (JSONL)

Small local models (such as Nemotron or IBM Granite fine-tunes managed via Axolotl or Llama-Factory) ingest data best via standard ChatML/OpenAI system-user-assistant formatting. Transform your structured output array into a standard .jsonl data file.

def export_to_chatml(dataset, output_file_path):
    with open(output_file_path, 'w', encoding='utf-8') as f:
        for item in dataset:
            chatml_structure = {
                "messages": [
                    {
                        "role": "system", 
                        "content": "You are an expert NASA Software Assurance Engineer auditing project artifacts strictly against NPR 7150.2D procedural software requirements."
                    },
                    {
                        "role": "user", 
                        "content": item.user_prompt
                    },
                    {
                        "role": "assistant", 
                        "content": item.assistant_response
                    }
                ]
            }
            f.write(json.dumps(chatml_structure) + '\n')

export_to_chatml(sft_dataset, "npr_7150_2d_training_data.jsonl")
print("Data pipeline finished. Output stored in npr_7150_2d_training_data.jsonl")

What the Resulting Output Looks Like

Your finalized npr_7150_2d_training_data.jsonl file will contain clean, scannable JSON rows optimized for fine-tuning local weights:

JSON
{"messages": [{"role": "system", "content": "You are an expert NASA Software Assurance Engineer..."}, {"role": "user", "content": "Evaluate this artifact for [SWE-146] auto-code generation compatibility: 'Flight control loops are auto-generated from Simulink models, which are committed directly to our main git repository.'"}, {"role": "assistant", "content": "Status: NON-COMPLIANT\n\nRationale: Per [SWE-146], the project manager must define the clear verification, validation, and tool-qualification approach for automated code generation tools, alongside configuration control of the generator itself. Committing the raw generated outputs without verifying the underlying tool parameters violates section (a) and (b) of the requirement."}]}

How would this work for NASA-STD-8739.8B?

Adapting this pipeline for NASA-STD-8739.8B (Software Assurance and Software Safety Standard) follows a very similar architectural pattern, but with a critical shift in how you parse and structure the source text.

While NPR 7150.2D establishes the procedural requirements (the "What"), NASA-STD-8739.8B provides the engineering implementation and safety assurance requirements (the "How"). Consequently, the document structure shifts from simple [SWE-XXX] tags to complex tiered hierarchies and specific Requirements Mapping Matrices (RMM).

Here is how you adapt your ingestion and generation pipeline to handle the unique structure of NASA-STD-8739.8B.

1. Adapt the Parser for Tiered Hierarchies

Unlike the flat [SWE-XXX] naming convention, NASA-STD-8739.8B uses standard nested hierarchical numbering (e.g., 4.1.1, 5.2.3.1) to dictate safety-critical constraints. The regex needs to capture these numbered headers while preserving the distinction between general Software Assurance obligations and Software Safety obligations.

Additionally, you want your parser to pay special attention to Section 4.3.1 (The Requirements Mapping Matrix), as this section explicitly cross-references how 8739.8B ties back to NPR 7150.2 requirements.

Updated Ingestion Pattern

Instead of splitting blindly on a single tag, use a regex pattern that looks for numbered engineering mandates at the start of paragraphs, tracking parent sections to preserve context.

import re

def parse_std_8739_8b(text):
    # Matches patterns like 4.1.1, 5.2.3, etc., at the start of structural blocks
    section_pattern = r'(?:^|\n\n)(\d+\.\d+\.\d+(?:\.\d+)?)\s+([A-Z][^\n]+)'
    
    # Find all section headers to map the document's skeleton
    matches = list(re.finditer(section_pattern, text))
    parsed_sections = []
    
    for i in range(len(matches)):
        start_idx = matches[i].end()
        end_idx = matches[i+1].start() if i + 1 < len(matches) else len(text)
        
        section_num = matches[i].group(1)
        section_title = matches[i].group(2).strip()
        body_content = text[start_idx:end_idx].strip()
        
        # Only capture blocks containing actionable requirements ("shall")
        if "shall" in body_content.lower():
            parsed_sections.append({
                "section": section_num,
                "title": section_title,
                "requirement_text": body_content
            })
            
    return parsed_sections

2. Shift the Synthetic Data Strategy to Safety & Assurance Tasks

The types of training data you generate for NASA-STD-8739.8B need to focus on independent verification, hazard analysis, safety-critical software isolation, and tool qualification.

Your prompt templates for the frontier LLM must change to reflect these specific engineering disciplines.

Task Type A: Software Safety & Hazard Analysis Validation

This trains your small model to recognize whether a project has properly executed software hazard tracking and mitigation verification.

{
  "messages": [
    {
      "role": "system",
      "content": "You are a NASA Software Safety Engineer auditing compliance against NASA-STD-8739.8B."
    },
    {
      "role": "user",
      "content": "Evaluate this safety log entry against NASA-STD-8739.8B software safety requirements:\n'Hazard H-04 (Thruster Over-fire) is mitigated by a software limit check in module flight_control.c. The software assurance team verified this by reviewing the developer's unit test results.'"
    },
    {
      "role": "assistant",
      "content": "Status: NON-COMPLIANT\n\nRationale: Per NASA-STD-8739.8B requirements for safety-critical software software safety tracking, software safety analysis and mitigation verification cannot rely solely on a developer's self-assertion or standard unit testing. Software Assurance must independently witness, execute, or verify dedicated hazard mitigation testing (Independent Verification and Validation / Software Safety Testing) specifically targeting the failure mode condition.\n\nCorrective Action: Software Assurance must perform an independent audit of the test run, witness the execution of the boundary-value test for Hazard H-04, and formally sign off on the specific safety test report artifact."
    }
  ]
}

Task Type B: Tailoring & RMM Compliance (Section 4.3.1)

This trains the model to understand the mapping between the standard and a project's specific tailored Requirements Mapping Matrix.

{
  "messages": [
    {
      "role": "system",
      "content": "You are a NASA Software Assurance Manager specialized in NASA-STD-8739.8B Section 4.3.1 Requirements Mapping Matrix compliance."
    },
    {
      "role": "user",
      "content": "A Class B, non-safety-critical payload project proposes to completely waive the entirety of Section 6 (Software Software Quality Assurance). Is this permissible under Section 4.3.1?"
    },
    {
      "role": "assistant",
      "content": "Status: REJECTED\n\nRationale: Under NASA-STD-8739.8B Section 4.3.1 and Appendix A, while tailoring is permitted based on software classification and safety relevance, a blanket waiver of Section 6 (Software Quality Assurance) for a Class B mission is non-compliant. Class B software requires foundational quality metrics, configuration audits, and problem reporting tracking.\n\nCorrective Action: The project must submit a formal tailoring matrix justifying modifications to *individual* sub-requirements within Section 6, rather than a blanket waiver, and obtain formal concurrence from the Center Center Chief Safety Officer."
    }
  ]
}

3. The Combined Fine-Tuning Execution Strategy

When building a specialized local AI for a space flight project, fine-tuning on both documents yields the best results because they are fundamentally interdependent.

Dual-Document Dataset Tagging: Label your JSONL rows with a source tag metadata string (e.g., "source": "NPR-7150.2D" or "source": "NASA-STD-8739.8B").
Cross-Reference Training: Intentionally generate ~15% of your dataset as cross-reference tasks. For instance, prompt the model to identify which specific section of NASA-STD-8739.8B fulfills the procedural mandate of NPR 7150.2D's software assurance requirement (SWE-022).

By feeding a small, local 8B or 14B model roughly 2,000 pairs from NPR 7150.2D and 2,000 pairs from NASA-STD-8739.8B, you create a highly secure, localized engine capable of parsing cross-program compliance matrices right from a laptop or private development workstation.

From <https://gemini.google.com/app/> Google Gemini 3.0 (Fast)

Search This Blog

Learning from ChatGPT