#4140 Knowledge Graph Multi-Step Queries

#4140 Describe the various Retrieval Augmented Generation (RAG) strategies applied within Python

Implementing Retrieval-Augmented Generation (RAG) in Python has evolved from simple "vector search" to a sophisticated ecosystem of multi-stage pipelines. In 2026, the strategy you choose depends on the complexity of your data and the accuracy required.

The various RAG strategies are generally categorized by where they intervene in the pipeline: Pre-Retrieval, Retrieval, and Post-Retrieval.

1. Pre-Retrieval & Ingestion Strategies

These focus on how data is prepared before a user ever asks a question.

Recursive Character Splitting: Instead of fixed-length chunks, Python libraries like LangChain use recursive splitters that look for natural boundaries (paragraphs, then sentences) to keep context intact.
Semantic Chunking: Using models (like semantic-chunkers) to split text based on changes in meaning rather than character count.
Hierarchical Indexing: Storing small chunks for fine-grained retrieval but linking them to larger "parent" documents or summaries to provide broader context if needed.
Hypothetical Document Embeddings (HyDE): An LLM generates a "fake" answer to the user's query first. The system then uses this hypothetical answer to search the database, as the "fake" answer’s vector often sits closer to the real data than a brief user question does.

2. Retrieval & Query Optimization

These strategies improve how the system finds the relevant "needles" in the haystack.

Hybrid Search: Combining Dense Retrieval (semantic/vector search) with Sparse Retrieval (keyword/BM25 search). This is the gold standard for production, as it catches both "meaning" and specific technical terms.
Multi-Query Retrieval: An LLM rewrites a single user query into 3–5 different versions. The system runs all of them and gathers the results, ensuring that slight variations in wording don't lead to missed information.
Self-Querying: The LLM converts a natural language question into a structured query (e.g., "Find reports from 2023 about solar power"). It then uses metadata filters (date=2023) alongside the vector search.
Routing: A "Router" node decides which database to query based on the topic. For example, it might route "What is my balance?" to a SQL database and "How do I reset my password?" to a PDF knowledge base.

3. Post-Retrieval & Generation Strategies

These refine the results before they reach the final LLM to prevent "context stuffing" and hallucinations.

Reranking: The initial search might return 20 documents. A smaller, specialized "Cross-Encoder" (like BGE-Reranker or Cohere Rerank) re-scores those 20 to find the top 3 most relevant ones.
Context Compression: Removing "noise" or redundant sentences from the retrieved chunks to fit more unique information into the LLM's prompt window.
Corrective RAG (CRAG): A "Grader" LLM checks if the retrieved documents are actually relevant. If they aren't, it triggers a web search or asks the user for clarification instead of guessing.

Python Framework Comparison (2026)

Framework	Best For	Typical Strategy
LlamaIndex	Data-heavy RAG	Excellent for complex indexing and "Data Agents."
LangChain	Prototyping	Modular "Chains" for multi-query and routing.
Haystack	Production Pipelines	Scalable, graph-based workflows with strict I/O.
DSPy	Optimization	Programmatic RAG where the system "optimizes" its own prompts.

Which strategy are you looking to implement?

I can provide a Python code snippet for a specific approach—for instance, a Hybrid Search implementation using Qdrant or a Reranking pipeline. Would you like to see one of those?

What about the various methods of prompting for Knowledge Graph data with embedded documents?

In a Knowledge Graph (KG) system with embedded documents, prompting is no longer about just "finding a chunk"; it's about contextual assembly. You are essentially asking the LLM to synthesize data from three distinct levels: structured triples (entities/relations), unstructured text (document chunks), and global structures (communities).

Here are the primary prompting methods for this hybrid architecture:

1. Local Subgraph Verbalization (Entity-Centric)

This is the most common method when a user asks about a specific person, project, or component.

The Logic: You retrieve a "seed" node from the user's query, traverse 1–2 hops to find its neighbors, and pull the embeddings/text chunks attached to those nodes.
The Prompting Strategy: You "verbalize" the graph structure into the prompt.
Prompt Structure:
"Based on the following knowledge graph entities and their associated document snippets, answer the query:
- Entity: [Component A] -> Relation: [is part of] -> Entity: [System B]
- Context Snippet (from Doc X): 'Component A was upgraded in 2025 to support...'
- Context Snippet (from Doc Y): 'System B requires a 400W power supply...'"

2. Global Community Summarization (Top-Down)

Popularized by Microsoft's GraphRAG, this method handles "broad" questions (e.g., "What are the main risks in this project?").

The Logic: During indexing, the graph is clustered into "communities" (using algorithms like Leiden). An LLM pre-summarizes each community.
The Prompting Strategy: You query the community summaries instead of individual chunks.
Prompt Style: "You are a researcher. Summarize the findings from the following Community Reports [Report A, Report B, Report C] regarding [User Query]. Cite the specific community ID for each claim."

3. "Parent-Child" Retrieval with Graph Context

This solves the problem of "fragmented" context where a vector search finds a chunk, but the chunk is useless without knowing which document section it came from.

The Logic: In the KG, document chunks are "Children" nodes linked to "Parent" section nodes.
The Prompting Strategy: When a child chunk is retrieved via vector similarity, the prompt is augmented with the Parent's metadata or even the Parent's summary to provide grounding.
Prompt Style: "The following chunk was found in the 'Safety Protocols' section of the 'NASA Gateway Handbook'. Use this hierarchy to interpret the data: [Chunk Text]."

4. Multi-Hop "Chain of Graph" Prompting

This is used for reasoning tasks (e.g., "How does a delay in Part A affect System C?").

The Logic: The system uses an agent to navigate the graph step-by-step.
The Prompting Strategy:
1. Step 1: Prompt the LLM to identify the starting entity.
2. Step 2: Prompt the LLM to choose which relationship (edge) to follow.
3. Step 3: Final prompt synthesizes the path: "We started at A, followed the 'depends_on' relation to B, and found the 'warning' snippet in Document Z."

Technical Summary: Hybrid Prompting Architectures

Method	Best For	Retrieval Mechanism
Linearized Triples	Fact-checking	Cypher/SPARQL Query -> Textual Triples
Contextual Ranking	Precision	Vector Search + Reranking based on Node Degree
Community Search	Synthesis	Global Indexing -> Summary Retrieval
Graph-as-Context	Reasoning	Subgraph Extraction -> Markdown Representation

Implementation Note for Python Developers

If you are using LlamaIndex or LangChain, look into the PropertyGraphIndex or GraphRAG modules. They automate the "verbalization" step—the process of turning those graph nodes and edges into a string that a standard LLM can actually understand.

Would you like me to generate a Python example of how to "verbalize" a Neo4j subgraph into a prompt-friendly string?

What about user prompt -> semantic search across vector documents -> LLM interpretation?

This workflow—moving from a raw user prompt to a semantic search and then to an LLM for final interpretation—is the core of most modern RAG systems. In Python, this is typically handled by orchestrators like LangChain, LlamaIndex, or Haystack.

Here is how each stage of that pipeline works and the specific strategies used to make them "smart."

1. The User Prompt: "The Intent Driver"

A raw user prompt is often messy, ambiguous, or underspecified. Before the search even begins, "Prompt-to-Search" strategies are used to clean it up.

Query Expansion: If a user asks "How do I fix the error?", the system uses an LLM to generate synonyms or related terms (e.g., "troubleshooting," "debugging," "known issues") to improve search recall.
Query Transformation (HyDE): As mentioned earlier, Hypothetical Document Embeddings turn the prompt into a "fake" answer first. This is effective because the vector of a question often looks very different from the vector of an answer in the database.
Prompt Filtering: Identifying specific entities in the prompt to apply metadata filters (e.g., if the prompt mentions "2024," the search is restricted to documents with a year: 2024 tag).

2. Semantic Search: "The Knowledge Retrieval"

Once the query is optimized, it is converted into a vector (embedding) and compared against your document store.

Similarity Metrics: Python libraries like FAISS or ChromaDB typically use Cosine Similarity or Dot Product to find the "Top-K" most relevant chunks.
Hybrid Search: A critical strategy where you combine Vector Search (meaning-based) with Keyword Search (exact-term matching). This ensures that if a user searches for a specific serial number or SKU, the system doesn't accidentally return a "semantically similar" but incorrect product.
Maximal Marginal Relevance (MMR): This strategy prevents the search from returning 10 chunks that all say the exact same thing. It forces diversity in the retrieved results so the LLM gets a broader view of the topic.

3. LLM Interpretation: "The Synthesis"

Now the LLM receives the original prompt plus the "bag of chunks" from the search. This is where the actual "interpretation" happens.

Context Stuffing & Prioritization: The LLM doesn't just read the chunks; the prompt is structured to tell the LLM: "Use ONLY the provided context. If the answer isn't there, say you don't know."
Chain of Thought (CoT): You can prompt the LLM to first summarize the retrieved chunks and then answer the user. This "thinking out loud" step significantly reduces hallucinations.
Citation & Attribution: One of the most important interpretation strategies is asking the LLM to cite its sources (e.g., "[Source 1]"). This allows the user to verify the "semantic interpretation" against the original document.
Reranking: Before the final interpretation, a "Reranker" model (like a Cross-Encoder) takes the Top-K results and scores them again specifically against the user's intent to ensure the absolute best information is at the top of the prompt.

Comparison: Raw Search vs. LLM Interpretation

Feature	Semantic Search Only	Semantic Search + LLM Interpretation
Output	A list of document snippets.	A natural language answer.
Synthesis	User must read and combine info.	LLM combines info from multiple docs.
Reasoning	None.	Can explain why something is the case.
Hallucination	Impossible (it just shows files).	Possible (mitigated by grounding prompts).

Practical Tip for your Projects

If you are building this in Python, the "Lost in the Middle" phenomenon is a real risk. LLMs often ignore information buried in the middle of a long list of retrieved chunks.

Pro-tip: When passing search results to your LLM, place the most relevant chunks at the very beginning and very end of the context window for better interpretation.

Would you like a Python code example using LangChain that demonstrates a basic "Query Transformation -> Search -> Interpret" loop?

What about user prompt -> LLM generation of Cypher statement based on presented schema -> execute Cypher query -> LLM interpretation?

This workflow—often called Text-to-Cypher or Natural Language to Graph (NL2G)—is the precision-oriented sibling of standard RAG. While vector-based RAG is good at "vibes" (semantic similarity), Text-to-Cypher is built for "facts" (relational logic, counts, and multi-hop traversal).

By 2026, the strategy has moved beyond simple one-shot generation to a multi-stage "Safe-Execution" pipeline.

1. Schema Injection & Linking

The LLM cannot guess your graph structure. You must "prime" it with the schema, but as of 2026, Schema Pruning is the preferred strategy to avoid context window bloat.

Dynamic Schema Retrieval: Instead of sending the whole schema (which can be massive), the system performs a quick keyword or vector search on the schema labels themselves. If the user asks about "employees," the system only injects the Person, Company, and WORKS_AT definitions.
Property Sampling: Along with the schema, include 2–3 sample values for each property (e.g., EmployeeID: "EMP-1234"). This prevents the LLM from hallucinating formats (like using names instead of IDs).

2. Cypher Generation Strategies

Generating a valid query is difficult. Modern Python pipelines use these refinement methods:

Few-Shot In-Context Learning: Providing 3–5 pairs of "Natural Language Question -> Correct Cypher Query" in the prompt. This teaches the LLM the "dialect" of your graph.
Chain-of-Thought (CoT): Asking the LLM to first write down the entities and relationships it needs before writing the Cypher.
LLM Thought: "I need to find a 'Person' node with name 'Alice', follow the 'MANAGES' edge to other 'Person' nodes, and count them."
Query Rewrite: The LLM rewrites the user query to be more explicit (e.g., "Who does Alice manage?" becomes "Find all Person nodes connected to Alice via a MANAGES relationship").

3. The Validation Loop (Self-Healing)

You should never execute LLM-generated Cypher directly without a safety check.

Syntax Checking: Use Python libraries like CyVer or pypher to ensure the syntax is valid before sending it to Neo4j.
The "Error-Retry" Loop: If the database returns a syntax error, the system feeds that error back to the LLM.
System: "This Cypher failed with: 'Relationship direction is wrong.' Please fix it."
Read-Only Enforcement: Using a restricted database driver or prepending the query with a check to ensure no SET, DELETE, or MERGE commands are present.

4. Result Interpretation & Grounding

Once the Cypher executes, the LLM receives the raw JSON/CSV data. Its job is to turn that "computer-speak" into a human answer.

Verbalization: The LLM translates: {"name": "Alice", "report_count": 5} → "Alice currently manages 5 people."
Missing Data Handling: If the graph query returns empty, the LLM is instructed to say "I found no records for that" rather than hallucinating an answer from its training data.

Python Implementation Stack (2026)

Tool	Role
Neo4j GraphRAG (Python)	High-level SDK for Text2Cypher and schema management.
LangChain `GraphCypherQAChain`	Standard chain for user -> query -> answer.
CyVer	Open-source library for verifying Cypher against a schema.
Instructor / Pydantic-AI	For forcing the LLM to output structured JSON that includes the Cypher and the reasoning.

Comparison: Vector RAG vs. Text-to-Cypher

Task	Vector RAG	Text-to-Cypher
Question	"What is our policy on remote work?"	"Which 5 managers have the most reports?"
Retrieval	Similarity (Top-K chunks).	Exact (Database Query).
Complexity	Simple, fast.	High (requires perfect schema/syntax).
Accuracy	Approximate.	Deterministic.

Would you like a Python code snippet demonstrating a "Self-Healing" Cypher loop where the LLM fixes its own syntax errors?

Provide a Python application to perform 1) Schema injections & linking, 2) cypher generation strategies, 3) validation loop (self-healing), and 4) result interpretation & grounding

Building a production-grade Text-to-Cypher pipeline requires moving beyond a simple "prompt and run" approach. Below is a Python application using LangChain, Neo4j, and Pydantic to implement a robust Self-Healing Cypher Pipeline.

Prerequisites

pip install langchain langchain-neo4j langchain-openai neo4j pydantic

The Self-Healing Cypher Application

import os
from typing import List, Optional
from pydantic import BaseModel, Field
from langchain_neo4j import Neo4jGraph
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from neo4j.exceptions import CypherSyntaxError

# 1. SCHEMA INJECTION & LINKING
# We use Neo4jGraph to automatically fetch the schema from your DB
os.environ["OPENAI_API_KEY"] = "your-key"
graph = Neo4jGraph(
    url="bolt://localhost:7687", 
    username="neo4j", 
    password="password"
)

# Define a structured output for the Cypher generator
class CypherResponse(BaseModel):
    chain_of_thought: str = Field(description="Reasoning for the query structure")
    cypher: str = Field(description="The generated Cypher statement")

class InterpretationResponse(BaseModel):
    answer: str = Field(description="Human-readable answer based on data")

# 2. CYPHER GENERATION STRATEGY
def generate_cypher(question: str, schema: str, error: Optional[str] = None) -> CypherResponse:
    llm = ChatOpenAI(model="gpt-4o", temperature=0)
    parser = PydanticOutputParser(pydantic_object=CypherResponse)
    
    error_context = f"\nPREVIOUS ERROR: {error}\nFix the query based on this error." if error else ""
    
    template = """
    Task: Generate a Neo4j Cypher statement to answer the user's question.
    Schema: {schema}
    Question: {question}
    {error_context}
    
    Instructions:
    - Use only the labels and relationships provided in the schema.
    - Do not use any introductory text, only the JSON format.
    {format_instructions}
    """
    
    prompt = ChatPromptTemplate.from_template(template)
    chain = prompt | llm | parser
    
    return chain.invoke({
        "question": question,
        "schema": schema,
        "error_context": error_context,
        "format_instructions": parser.get_format_instructions()
    })

# 3. VALIDATION LOOP (SELF-HEALING)
def execute_with_retry(question: str, max_retries: int = 3):
    schema = graph.get_schema
    current_error = None
    
    for attempt in range(max_retries):
        print(f"--- Attempt {attempt + 1} ---")
        try:
            # Step 2: Generate
            gen_result = generate_cypher(question, schema, error=current_error)
            print(f"Generated Cypher: {gen_result.cypher}")
            
            # Step 3: Validate & Execute
            # graph.query() will raise an exception if the Cypher is invalid
            result_data = graph.query(gen_result.cypher)
            return result_data, gen_result.cypher
            
        except Exception as e:
            print(f"Validation Failed: {str(e)}")
            current_error = str(e)
            if attempt == max_retries - 1:
                raise Exception("Max retries reached. Could not generate valid Cypher.")

# 4. RESULT INTERPRETATION & GROUNDING
def interpret_results(question: str, data: List, query: str):
    llm = ChatOpenAI(model="gpt-4o", temperature=0)
    
    template = """
    User Question: {question}
    Cypher Query Used: {query}
    Database Result: {data}
    
    Task: Interpret the database results into a natural language response.
    Grounding Rules:
    - If the result is empty, say you couldn't find that information.
    - Be precise and stick only to the data provided.
    """
    
    prompt = ChatPromptTemplate.from_template(template)
    chain = prompt | llm
    
    response = chain.invoke({"question": question, "data": data, "query": query})
    return response.content

# --- MAIN EXECUTION ---
if __name__ == "__main__":
    user_query = "Who are the top 3 managers by number of direct reports?"
    
    try:
        # Steps 1-3
        raw_data, final_query = execute_with_retry(user_query)
        
        # Step 4
        final_answer = interpret_results(user_query, raw_data, final_query)
        
        print("\n=== FINAL OUTPUT ===")
        print(final_answer)
        
    except Exception as final_err:
        print(f"System Failure: {final_err}")

Key Features of this Implementation:

Schema Linking: graph.get_schema pulls the live state of your database (Labels, Relationships, Properties) and injects it into the prompt.
Pydantic Enforcement: We force the LLM to provide a chain_of_thought before the Cypher. This improves the logic of the generated query.
The "Try-Repair" Loop: If graph.query() fails due to syntax errors or schema violations, the Exception is caught. The error message is then fed back into the next prompt, allowing the LLM to "self-heal" by understanding exactly what went wrong.
Strict Grounding: In the interpretation phase, the LLM is given both the raw data and the query used to get it, ensuring it doesn't hallucinate facts outside the retrieved scope.

Would you like me to add a "Read-Only" guardrail to ensure the LLM cannot generate DELETE or DROP statements?

Yes

Adding a Read-Only Guardrail is a smart move, especially given your focus on secure system architectures and NASA-level safety standards. In a Text-to-Cypher pipeline, this acts as a "software diode," ensuring that natural language queries can't be weaponized to DELETE or DETACH your data.

Here is the updated implementation featuring a Regex-based Validator and a System-level restriction.

Updated Python Implementation with Security Guardrails

import re
from typing import Tuple

# --- NEW: SECURITY GUARDRAIL ---
def is_read_only(cypher: str) -> Tuple[bool, str]:
    """
    Validates that the Cypher query does not contain mutation keywords.
    """
    # Prohibited keywords that modify the graph
    prohibited = [
        r"\bCREATE\b", r"\bMERGE\b", r"\bSET\b", r"\bDELETE\b", 
        r"\bDETACH\b", r"\bREMOVE\b", r"\bDROP\b", r"\bCALL\b"
    ]
    
    # Check for keywords (case-insensitive)
    for pattern in prohibited:
        if re.search(pattern, cypher, re.IGNORECASE):
            # Exception: Allow 'CALL' for built-in read-only procedures if needed, 
            # otherwise block it for safety.
            return False, f"Prohibited keyword detected: {pattern.strip('\\b')}"
            
    return True, "Success"

# --- MODIFIED VALIDATION LOOP ---
def execute_with_retry(question: str, max_retries: int = 3):
    schema = graph.get_schema
    current_error = None
    
    for attempt in range(max_retries):
        try:
            # 1. Generate
            gen_result = generate_cypher(question, schema, error=current_error)
            cypher = gen_result.cypher
            
            # 2. SECURITY CHECK (Before execution)
            is_safe, message = is_read_only(cypher)
            if not is_safe:
                print(f"Guardrail Blocked: {message}")
                current_error = f"Security Violation: Your query attempted a write operation. Only READ operations (MATCH/RETURN) are allowed."
                continue # Retry and tell the LLM it violated security
            
            # 3. Execution (Using a Read-Only transaction if supported by driver)
            result_data = graph.query(cypher)
            return result_data, cypher
            
        except Exception as e:
            current_error = str(e)
            if attempt == max_retries - 1:
                raise Exception("Failed to produce a safe/valid query.")

# --- USAGE ---
# If a user tries: "Delete all nodes related to Project X"
# The LLM will generate a DELETE statement, the guardrail will catch it,
# and the retry loop will force the LLM to write a safe alternative or fail.

Why this approach works for your "Maker" mindset:

The Regex Diode: By using a pre-execution check, you create an immutable boundary that the LLM's "hallucination" cannot cross.
The Feedback Loop: Instead of just crashing, the system tells the LLM: "You violated security." This forces the model to refine its logic to be compliant with the read-only constraint.
Infrastructure Defense: In a production NASA-subcontracting environment, you would combine this Python check with Neo4j Role-Based Access Control (RBAC), ensuring the database user itself only has PUBLIC read permissions.

A High-Value Next Step

Since you are interested in Knowledge Graphs and NASA S&MA, would you like me to show you how to extend this to a "Multi-Agent" RAG?

In that scenario, one agent handles the Cypher generation, while a second "Security Agent" reviews the code before it hits the database—simulating a human-in-the-loop verification process. Would you like to explore that architecture?

From <https://gemini.google.com/app/> Google Gemini 3.0 Pro (Fast)

Search This Blog

Learning from ChatGPT