#4440 Prompts for Domains -> Sub-Domains -> Systems Breakdown -> JSON -> Cypher Statements

#4440 I have 3 consecutive prompts:

1) "Within domain [domain, e.g. Mechanical Systems], list 15 sub-domains",
2) "List 10 unique systems within each of transportation systems domain sub-domains",
3) "Break down [system] into subsystems, components and parts, with functional description for each. Identify interaction relationships between entities Respond with a JSON list."

Improve these prompts to run in sequence, producing complete, correct and consistent valid JSON output.

To make these prompts work seamlessly in a sequential pipeline—where the output of one smoothly feeds into the next—we need to enforce strict structural constraints.

The biggest risks in sequential prompting are hallucination, inconsistent formatting, and truncated JSON (especially in prompt 3, which asks for a massive amount of data).

Here is an improved, production-ready sequence designed to guarantee valid, consistent JSON at every step.

The Optimized Prompt Sequence

Prompt 1: Establish the Sub-domains

Purpose: Sets up a clean, predictable array of sub-domains.
Input: [Domain] (e.g., Mechanical Systems)

You are an expert systems engineer. Within the domain of [Domain], list exactly 15 distinct sub-domains. 

Your response must be strictly valid JSON and nothing else. Do not include any introductory or concluding text, and do not use markdown code blocks (e.g., ```json).

Use the following exact JSON schema:
{
  "domain": "[Domain]",
  "sub_domains": [
    "Sub-domain 1",
    "Sub-domain 2",
    ...
  ]
}

Prompt 2: Generate the Systems

Purpose: Maps specific systems to the sub-domains generated in Step 1.
Input: Paste the sub_domains array from Prompt 1.

You are an expert systems engineer. For each of the following sub-domains, list exactly 10 unique, real-world systems. 

Sub-domains to process:
[Insert JSON array or list of sub-domains from Prompt 1]

Your response must be strictly valid JSON and nothing else. Do not include any conversational text or markdown formatting.

Use the following exact JSON schema:
{
  "sub_domain_systems": [
    {
      "sub_domain": "Name of Sub-domain",
      "systems": [
        "System 1",
        "System 2",
        ...
      ]
    }
  ]
}

Prompt 3: Deep Breakdown & Interactions

Purpose: Deconstructs a single chosen system.
Crucial Change: Prompting a LLM to break down all systems at once into subsystems, components, and parts will cause it to hit token limits and output broken JSON. This prompt is optimized to do it flawlessly one system at a time.
Input: [System] (chosen from Prompt 2's output)

You are an expert systems engineer. Provide a detailed, hierarchical breakdown of the following system into its subsystems, components, and parts. For each entity, provide a concise functional description. Additionally, map the critical interaction relationships between these entities.

System to break down: [System]

Your response must be strictly valid JSON and nothing else. Do not include any markdown formatting (like ```json) or explanatory text. Ensure all JSON keys and strings are properly escaped.

Use the following exact JSON structure:
{
  "system_name": "[System]",
  "hierarchy": [
    {
      "subsystem_name": "Name of Subsystem",
      "functional_description": "Description of what this subsystem does.",
      "components": [
        {
          "component_name": "Name of Component",
          "functional_description": "Description of what this component does.",
          "parts": [
            {
              "part_name": "Name of Part",
              "functional_description": "Description of what this part does."
            }
          ]
        }
      ]
    }
  ],
  "interactions": [
    {
      "source_entity": "Name of Component/Part/Subsystem",
      "target_entity": "Name of Component/Part/Subsystem",
      "interaction_type": "e.g., Mechanical, Electrical, Thermal, Fluid, Data",
      "description": "How these entities interact or depend on each other."
    }
  ]
}

Why These Improvements Work

Explicit Schema Definitions: By providing the exact JSON structure, you prevent the LLM from making up its own keys or nesting logic between steps.
Anti-Chatter Guardrails: Explicitly forbidding conversational text ("Here is your JSON:") and markdown code fences (```json) ensures the output can be parsed programmatically without regex cleaning.
Scoped Down for Token Safety: In Prompt 3, instead of asking for a breakdown of all systems (which would instantly crash or truncate), it focuses on [System], allowing you to loop through the systems programmatically or target specific ones with maximum depth.

Are you building an automated script (like Python or Node.js) to chain these prompts together, or are you running them manually in a chat interface?

My plan is to develop an automated script to run within Python calling an Ollama-supported local LLM like GPT-OSS-20B

To build an automated Python script using Ollama and a reasoning model like GPT-OSS-20B, we need to take advantage of Ollama's native structured output capabilities (format="json" or Pydantic schemas).

Because gpt-oss-20b relies heavily on internal Chain-of-Thought (CoT) reasoning, standard text prompting can sometimes cause it to output messy strings or Markdown blocks instead of pure JSON. Letting the Ollama engine enforce the JSON schema ensures structural consistency.

The complete Python automation script uses pydantic to enforce the schemas natively, handling the sequential pipeline flawlessly.

Prerequisites

Make sure your environment has the required libraries installed:

pip install ollama pydantic

The Python Automation Script

import json
from typing import List
from pydantic import BaseModel, Field
import ollama

# Define the Model Name
# Note: Ensure you have run `ollama run gpt-oss:20b` first
MODEL_NAME = "gpt-oss:20b"

# =====================================================================
# 1. DEFINE STRICLY ENFORCED PYDANTIC SCHEMAS
# =====================================================================

class SubDomainsResponse(BaseModel):
    domain: str = Field(description="The root domain provided by the user.")
    sub_domains: List[str] = Field(description="Exactly 15 distinct sub-domains.")

class SystemList(BaseModel):
    sub_domain: str = Field(description="The name of the sub-domain.")
    systems: List[str] = Field(description="Exactly 10 unique real-world systems within this sub-domain.")

class SystemsResponse(BaseModel):
    sub_domain_systems: List[SystemList]

class PartBreakdown(BaseModel):
    part_name: str
    functional_description: str

class ComponentBreakdown(BaseModel):
    component_name: str
    functional_description: str
    parts: List[PartBreakdown]

class SubsystemBreakdown(BaseModel):
    subsystem_name: str
    functional_description: str
    components: List[ComponentBreakdown]

class InteractionMap(BaseModel):
    source_entity: str = Field(description="Name of the component, part, or subsystem sending the input/force.")
    target_entity: str = Field(description="Name of the component, part, or subsystem receiving it.")
    interaction_type: str = Field(description="e.g., Mechanical, Electrical, Thermal, Fluid, Data")
    description: str = Field(description="How these entities interact or depend on each other.")

class DeepBreakdownResponse(BaseModel):
    system_name: str
    hierarchy: List[SubsystemBreakdown]
    interactions: List[InteractionMap]


# =====================================================================
# 2. RUN THE SEQUENTIAL PIPELINE
# =====================================================================

def run_pipeline(root_domain: str):
    print(f"🚀 Phase 1: Identifying sub-domains for '{root_domain}'...")
    
    # Prompt 1: Get Sub-domains
    prompt_1 = f"Within the domain of {root_domain}, identify and list exactly 15 distinct sub-domains."
    
    response_1 = ollama.chat(
        model=MODEL_NAME,
        messages=[{"role": "user", "content": prompt_1}],
        format=SubDomainsResponse.model_json_schema() # Forces valid JSON matching the schema
    )
    
    # Parse Phase 1 output
    phase_1_data = json.loads(response_1['message']['content'])
    sub_domains = phase_1_data.get("sub_domains", [])
    print(f"✅ Found {len(sub_domains)} sub-domains.")

    # -----------------------------------------------------------------
    
    print(f"\n🚀 Phase 2: Generating 10 systems per sub-domain...")
    # Prompt 2: Feed the sub-domains array directly back into the LLM
    prompt_2 = (
        f"For each of the following sub-domains, list exactly 10 unique, real-world systems:\n"
        f"{json.dumps(sub_domains)}"
    )
    
    response_2 = ollama.chat(
        model=MODEL_NAME,
        messages=[{"role": "user", "content": prompt_2}],
        format=SystemsResponse.model_json_schema()
    )
    
    # Parse Phase 2 output
    phase_2_data = json.loads(response_2['message']['content'])
    print(f"✅ Systems successfully mapped for all sub-domains.")
    
    # -----------------------------------------------------------------
    
    # Phase 3 Selector: Let's pick a system dynamically to showcase Phase 3.
    # We will pick the 1st system of the 1st sub-domain found.
    try:
        sample_subdomain = phase_2_data["sub_domain_systems"][0]["sub_domain"]
        target_system = phase_2_data["sub_domain_systems"][0]["systems"][0]
    except (KeyError, IndexError):
        sample_subdomain = "Unknown"
        target_system = "Internal Combustion Engine" # Fallback if list indexing fails
        
    print(f"\n🚀 Phase 3: Breaking down target system '{target_system}' (from Sub-domain: '{sample_subdomain}')...")
    
    # Prompt 3: Deep system deconstruction
    prompt_3 = (
        f"Provide a comprehensive hierarchical breakdown of the system '{target_system}' "
        f"into its subsystems, components, and parts. Provide functional descriptions for each. "
        f"Map the critical interaction relationships between these entities."
    )
    
    response_3 = ollama.chat(
        model=MODEL_NAME,
        messages=[{"role": "user", "content": prompt_3}],
        format=DeepBreakdownResponse.model_json_schema(),
        options={"temperature": 0.2} # Low temperature ensures strict technical accuracy
    )
    
    # Parse Phase 3 output
    phase_3_data = json.loads(response_3['message']['content'])
    print(f"✅ Deep breakdown completed successfully for '{target_system}'.")
    
    # -----------------------------------------------------------------
    # Save the final complex data layer to a JSON file
    filename = f"{target_system.lower().replace(' ', '_')}_breakdown.json"
    with open(filename, "w") as f:
        json.dump(phase_3_data, f, indent=2)
    print(f"\n💾 Saved full system architecture blueprint to '{filename}'!")

if __name__ == "__main__":
    # Test it with a domain of your choice
    run_pipeline("Mechanical Systems")

Key Local LLM Adjustments Applied Here:

format=Schema.model_json_schema(): Instead of simply string-prompting the LLM to output JSON, this utilizes Ollama's back-end constraint parser. The model is physically constrained from outputting anything other than JSON matching your structure, preventing JSON parsing syntax errors.
Context Safety: Rather than attempting to break down all 150 systems generated in step 2 (which would cause GPT-OSS-20B to run out of output context window or trigger memory crashes), the script is designed to let you target a single system at a time (target_system). You can easily wrap Phase 3 in a for loop to step through specific systems sequentially.
Low Temperature (0.2) in Phase 3: This reins in the model's tendency to hallucinate unnecessary parts or get over-creative with structural interactions, focusing purely on strict engineering definitions.

Provide a good prompt to convert the JSON to Cypher statements to be used to populate a knowledge graph

To map a deeply nested JSON hierarchy (System $\rightarrow$ Subsystem $\rightarrow$ Component $\rightarrow$ Part) along with an array of cross-entity interactions into a Graph Database, your Cypher statements must handle deduplication and dynamic properties efficiently.

The best way to do this in Cypher is using MERGE statements with distinct node labels (System, Subsystem, Component, Part) and maintaining structural relationships (HAS_SUBSYSTEM, HAS_COMPONENT, HAS_PART).

Here is a highly optimized prompt designed to generate clean, execution-ready Cypher script using uniqueness guarantees.

The Cypher Conversion Prompt

You are an expert Graph Database Architect specialized in Neo4j and Cypher query language. Your task is to convert the provided system breakdown JSON into a series of highly optimized, syntactically correct Cypher statements used to populate a knowledge graph.

### Constraints & Rules:
1. **Idempotency**: Use `MERGE` instead of `CREATE` for all nodes and relationships to prevent duplicate data if the script is run multiple times.
2. **Contextual Scope**: Because part names or component names might be generic (e.g., "Bolt" or "Sensor"), always uniquely identify elements by storing their parent system or subsystem context as a property, or build a unique deterministic ID variable.
3. **Property Mapping**: Include all provided metadata (like `functional_description` or `interaction_type`) as node or relationship properties.
4. **Relationship Directions**: Ensure arrows `->` indicate directional structural ownership (e.g., `(:System)-[:HAS_SUBSYSTEM]->(:Subsystem)`) and functional flows.
5. **Output Cleanliness**: Output ONLY raw Cypher code. Do not wrap the output in markdown code blocks (```cypher), and do not include any introductions, conclusions, or execution instructions.

### Target Graph Schema:
- **Nodes**: 
  - (:System {name, functional_description})
  - (:Subsystem {name, functional_description})
  - (:Component {name, functional_description})
  - (:Part {name, functional_description})
- **Structural Relationships**:
  - (:System)-[:HAS_SUBSYSTEM]->(:Subsystem)
  - (:Subsystem)-[:HAS_COMPONENT]->(:Component)
  - (:Component)-[:HAS_PART]->(:Part)
- **Interaction Relationships**:
  - Dynamically handle cross-entity interactions. Match the `source_entity` and `target_entity` (which could be a Subsystem, Component, or Part) and draw a relationship between them. Use the `interaction_type` as a relationship property (or dynamic type if preferred) with the description embedded, e.g., `-[r:INTERACTS_WITH {type: "Thermal", description: "..."}]->`.

### Input JSON to Convert:
[INSERT YOUR JSON STRING HERE]

Pro-Tip for your Python Automation Script

Since you are already running this programmatically, you can add a final step to your script that uses this prompt to generate the Cypher commands, or use the neo4j Python driver to run a generic parsing script directly on the data layer.

If you choose to use the prompt approach, pass the final JSON string from your file right into the placeholder at the bottom!

From <https://gemini.google.com/app/> Google Gemini (3.5 Flash)

Search This Blog

Learning from ChatGPT