#3895 Natural Language to DSL
#3895 Natural Language to DSL
#3895 Consider a DSL using natural language noun1-verb-noun2 that equates to function verb(noun1,noun2)
This architecture—transforming Noun-Verb-Noun (Infix notation) into Verb(Noun, Noun) (Prefix/Functional notation)—is the foundational structure of subject-predicate-object triples. This is highly effective for building readable scripts that map directly to executable code or Knowledge Graph insertions.
Here is a breakdown of how to implement this DSL pattern effectively.
1. The Conceptual Mapping
You are essentially performing a syntactic transformation from natural language to a function signature.
| Component | Natural Language (DSL) | Functional Equivalent | Role |
| Noun 1 | System | arg1 | Subject / Caller |
| Verb | initiates | function_name | Method / Predicate |
| Noun 2 | Sequence | arg2 | Object / Parameter |
2. Python Implementation: Dictionary Dispatch
The most robust way to implement this in Python without using a heavy parser (like simple string splitting) is using a Dispatcher. This prevents the security risk of using eval() and allows you to map natural language verbs to specific internal function names.
class SystemController:
def __init__(self):
# The 'Knowledge Base' or state
self.state = {"Valve A": "Closed", "Pump B": "Off"}
# --- The Actual Functions ---
def _open(self, subject, target):
print(f"ACTION: {subject} is opening {target}")
self.state[target] = "Open"
def _verify(self, subject, target):
status = self.state.get(target, "Unknown")
print(f"QUERY: {subject} sees {target} is {status}")
# --- The Parser ---
def execute_dsl(self, command_string):
# 1. Tokenize (Simple space splitting)
try:
n1, verb, n2 = command_string.split()
except ValueError:
return "Error: Command must be 'Noun Verb Noun'"
# 2. Map Verbs to Functions (The Dispatcher)
verb_map = {
"opens": self._open,
"checks": self._verify,
"scans": self._verify
}
if verb in verb_map:
# 3. Execute: verb(noun1, noun2)
func = verb_map[verb]
func(n1, n2)
else:
print(f"Error: Unknown verb '{verb}'")
# Usage
bot = SystemController()
bot.execute_dsl("Pilot opens Valve_A") # Output: ACTION: Pilot is opening Valve_A
bot.execute_dsl("Computer checks Valve_A") # Output: QUERY: Computer sees Valve_A is Unknown
3. Mathematica / Wolfram Language Implementation
If you are doing symbolic modeling, Mathematica is uniquely suited for this because it allows you to define custom notations that rewrite input on the fly.
You can use Notation or simple pattern matching to define the "Noun Verb Noun" syntax as a valid input form.
(* Define the function logic *)
activate[source_, target_] := Print[source, " is powering up ", target];
(* Define the DSL transformation rule *)
(* Whenever Mathematica sees "A activates B", it treats it as activate[A, B] *)
MakeExpression[RowBox[{n1_, "activates", n2_}], StandardForm] :=
MakeExpression[RowBox[{"activate", "[", n1, ",", n2, "]"}], StandardForm]
(* Usage in Notebook *)
(* User types: *)
Generator activates Circuit
(* Mathematica executes: activate[Generator, Circuit] *)
4. Handling Complexity (Multi-word Nouns)
The generic split() method fails if your Nouns are "Red Button". To handle natural language properly, you need a strategy to identify the Verb first, as it acts as the pivot point.
Algorithm:
Define a list of known valid verbs (e.g.,
["starts", "stops", "reboots"]).Scan the sentence for the verb.
Everything to the left is
Noun1(Subject).Everything to the right is
Noun2(Object).
Python Example using Regex:
import re
def parse_complex_dsl(sentence):
# known verbs act as delimiters
verbs = ["initiates test on", "connects to", "verifies"]
# Create regex pattern: (.*?)\s(verb1|verb2|verb3)\s(.*)
pattern = f"(.*?)\\s({'|'.join(verbs)})\\s(.*)"
match = re.search(pattern, sentence, re.IGNORECASE)
if match:
noun1 = match.group(1).strip()
verb = match.group(2).strip()
noun2 = match.group(3).strip()
# Transform to function call format string for display
return f"{verb.replace(' ', '_')}( '{noun1}', '{noun2}' )"
return "No valid verb found."
# Usage
print(parse_complex_dsl("Main Engine 3 initiates test on Hydraulic Pump"))
# Result: initiates_test_on( 'Main Engine 3', 'Hydraulic Pump' )
5. Application: Knowledge Graphs (Triples)
This pattern is the exact syntax used for RDF Triples or Graph Databases (like Neo4j).
DSL:
UserfollowsTopicGraph:
(User)-[:FOLLOWS]->(Topic)
If you are building a system to organize data, this DSL allows you to read natural language requirements and immediately hydrate a graph database.
Would you like to see how to extend this to handle "chained" commands (e.g., "Noun1 Verb Noun2 AND Verb Noun3")?
How would other word types like adjectives, etc., be utilized?
When extending a DSL beyond simple triples, modifiers like adjectives and adverbs serve to constrain data or configure execution. In programming terms, these convert simple variable references into Object Instantiation or Function Configuration.
Here is how you map linguistic modifiers to software architecture.
1. Adjectives: Properties and Filters
Role: Adjectives modify Nouns.
Software Equivalent: Constructor Arguments or Database Filters.
When a user says "Open emergency valve," they are not referring to the generic class Valve, but a specific instance or a subset of valves with the property type="emergency".
Pattern: Adjective Noun $\rightarrow$ Object(property)
Implementation Strategy:
Instead of passing a string string "valve", your parser creates an object or a dictionary.
# DSL: "System opens emergency valve"
# 1. Standard Noun-Verb-Noun
# verb = open
# noun1 = System
# noun2 = "emergency valve" <--- Needs decomposition
# 2. Decomposition Logic
def resolve_entity(phrase):
tokens = phrase.split()
# Assume last word is the Class, previous words are properties
entity_class = tokens[-1]
attributes = tokens[:-1]
return {"class": entity_class, "attributes": attributes}
# Resulting Function Call:
# open(
# subject={"class": "System", "attributes": []},
# object={"class": "valve", "attributes": ["emergency"]}
# )
In a Knowledge Graph context:
This maps to a query where the Adjective matches a property on the node.
MATCH (n:Valve {type: 'emergency'})
2. Adverbs: Configuration and Modes
Role: Adverbs modify Verbs.
Software Equivalent: Keyword Arguments (kwargs) or Flags.
When a user says "System slowly opens valve," they are modifying the execution method of the function.
Pattern: Adverb Verb $\rightarrow$ function(mode=Adverb)
Implementation Strategy:
You maintain a list of known adverbs that map to configuration parameters.
# DSL: "System slowly opens valve"
def execute(command):
# known adverbs
speed_modifiers = ["slowly", "quickly", "instantly"]
tokens = command.split()
# Extract adverbs
mode = "default"
clean_tokens = []
for t in tokens:
if t in speed_modifiers:
mode = t # Capture the adverb
else:
clean_tokens.append(t)
# Reassemble and execute
# ... logic to extract subject/verb/object from clean_tokens ...
# Final Call
# open(subject, object, speed=mode)
3. Prepositional Phrases: Context & Values
Role: Phrases starting with "at," "to," "by," "with."
Software Equivalent: Explicit Value Assignment.
These are critical for setting numerical values or targets.
"Set volume to 50%"
"Run test for 10 seconds"
Implementation Strategy:
These usually appear at the end of the sentence. You parse the standard N-V-N, and treat the remainder as a payload.
# DSL: "Engine increases thrust to 80%"
# Regex is your friend here
# Pattern: (Noun) (Verb) (Noun) (Preposition) (Value)
import re
text = "Engine increases thrust to 80%"
match = re.search(r"(.*?) (.*?) (.*?) (to|at|by) (.*)", text)
if match:
n1, verb, n2, prep, val = match.groups()
# Function Signature:
# increase(target="thrust", value="80%", value_type="absolute")
# Note: 'to' usually implies absolute value, 'by' implies relative increment.
4. Advanced: Functional Composition (Mathematica Style)
If you view this through a functional programming lens (like Haskell or Mathematica), adjectives can be functions that wrap the noun.
Instead of Valve(type="red"), you can think of it as Red(Valve).
Mathematica Example:
(* The Adjective is a function that returns a specific subset *)
Emergency[x_] := Select[x, #Type == "Emergency" &];
(* The Verb operates on the result of that function *)
Open[ target_ ] := SetStatus[ target, "Open" ];
(* DSL Input *)
Open @ Emergency @ Valves
This is elegant because it allows infinite chaining: Open @ Red @ Emergency @ Valves.
Summary Table
| Word Type | Sentence Role | Software Role | Example |
| Noun | Subject/Object | Class Instance / ID | Valve_A |
| Adjective | Modifier | Property / Attribute | Emergency (type) |
| Verb | Predicate | Function / Method | Opens |
| Adverb | Modifier | Parameter / Flag | Slowly (speed) |
| Preposition | Relator | Argument Key | To (assign value) |
Would you like to explore how to implement a parser for this that uses a "Dictionary of Grammar" to automatically classify these word types?
Yes
This approach shifts the complexity from the code to the data. By defining a Lexicon (a Dictionary of Grammar), your parser becomes a generic engine that doesn't need to change even if you add new commands or hardware.
Here is the architectural pattern for a Lexicon-Driven Slot Filler.
1. The Architecture: Slot Filling
Instead of a complex recursive tree, we use a "Slot Filling" strategy. We assume a canonical command structure fits a template, and we categorize words to fill those slots.
Target Template: [Adverb/Mode] [Verb] [Adjective/Filter] [Noun]
2. The Lexicon (The Data)
This dictionary maps the "Surface Form" (what the user types) to the "Semantic Form" (what the code understands).
class DSLParser:
def __init__(self):
# The "Brain": Maps words to Types and Internal Values
self.lexicon = {
# --- VERBS (Methods) ---
"opens": {"type": "VERB", "action": "SET_STATE", "value": "OPEN"},
"closes": {"type": "VERB", "action": "SET_STATE", "value": "CLOSED"},
"checks": {"type": "VERB", "action": "QUERY", "value": "STATUS"},
"calibrates":{"type": "VERB", "action": "MAINTAIN", "value": "CALIBRATE"},
# --- NOUNS (Classes/Entities) ---
"valve": {"type": "NOUN", "target": "device_valve"},
"pump": {"type": "NOUN", "target": "device_pump"},
"sensor": {"type": "NOUN", "target": "device_sensor"},
# --- ADJECTIVES (Filters/Properties) ---
"emergency": {"type": "ADJ", "key": "category", "value": "critical"},
"auxiliary": {"type": "ADJ", "key": "category", "value": "backup"},
"oxygen": {"type": "ADJ", "key": "medium", "value": "o2"},
# --- ADVERBS (Parameters) ---
"slowly": {"type": "ADV", "key": "speed", "value": 0.5},
"immediately":{"type": "ADV", "key": "priority", "value": "high"},
}
3. The Parser Logic (The Engine)
The parser iterates through the sentence, looks up the word in the lexicon, and assigns it to a "Command Object." This handles word order flexibility (e.g., "Slowly open valve" vs "Open valve slowly") reasonably well.
def parse(self, sentence):
tokens = sentence.lower().split()
# The "Command Object" (The Slots)
cmd = {
"action": None, # From Verb
"target_class": None, # From Noun
"filters": {}, # From Adjectives
"params": {} # From Adverbs
}
for token in tokens:
definition = self.lexicon.get(token)
if not definition:
print(f"Warning: Word '{token}' not recognized. Ignoring.")
continue
# Dispatch based on Part of Speech (POS)
token_type = definition["type"]
if token_type == "VERB":
cmd["action"] = definition["action"]
# Some verbs might imply a parameter (e.g., open -> value: OPEN)
if "value" in definition:
cmd["params"]["target_state"] = definition["value"]
elif token_type == "NOUN":
cmd["target_class"] = definition["target"]
elif token_type == "ADJ":
# Adjectives become database filters
key = definition["key"]
val = definition["value"]
cmd["filters"][key] = val
elif token_type == "ADV":
# Adverbs become execution parameters
key = definition["key"]
val = definition["value"]
cmd["params"][key] = val
return cmd
4. Execution Logic
Now you have a clean dictionary object that is easy to pass to a backend function or an SQL generator.
def execute(self, cmd_object):
# Example of converting the parsed object into a SQL-like query
if not cmd_object["action"] or not cmd_object["target_class"]:
return "Error: Incomplete command."
# Constructing a pseudo-query
query = f"SELECT * FROM {cmd_object['target_class']} WHERE 1=1"
# Apply Adjective Filters
for k, v in cmd_object["filters"].items():
query += f" AND {k}='{v}'"
print(f"--- EXECUTING ---")
print(f"Action: {cmd_object['action']}")
print(f"Params: {cmd_object['params']}")
print(f"Context: {query}")
# --- DEMO ---
bot = DSLParser()
user_input = "Slowly opens auxiliary oxygen valve"
parsed_data = bot.parse(user_input)
bot.execute(parsed_data)
Output:
--- EXECUTING ---
Action: SET_STATE
Params: {'target_state': 'OPEN', 'speed': 0.5}
Context: SELECT * FROM device_valve WHERE category='backup' AND medium='o2'
5. Why this works for your stack
SQLite Integration: The
cmd["filters"]dictionary maps directly toWHEREclauses (as shown in the execution example).Knowledge Graphs: The Noun (
device_valve) matches a Node Label, and Adjectives match Node Properties. The parsed object is essentially a set of criteria to match a specific node in your graph.Synonym Management: You can easily map multiple keys to the same definition in the dictionary without changing code:
"shut": {"type": "VERB", "action": "SET_STATE", "value": "CLOSED"}, "close": {"type": "VERB", "action": "SET_STATE", "value": "CLOSED"},
6. Handling "Unknowns" (The 40-year veteran perspective)
In a real NASA context, you cannot just ignore unknown words. You need Error Recovery.
If definition is None:
Fuzzy Match: Use
difflib.get_close_matchesto see if it was a typo (e.g., "vlave" -> "valve").Pass-through: If it looks like a distinct ID (e.g., "V-102"), assume it is a specific Noun Instance rather than a Class.
Heuristic: If it contains numbers or capitals, treat as
ID.
Would you like me to demonstrate how to convert this parsed "Command Object" into a Cypher query for your Knowledge Graph?
Describe how to handle "chained" commands (e.g., "Noun1 Verb Noun2 AND Verb Noun3")
Handling chained commands introduces the concept of Context Persistence (or Anaphora Resolution in linguistics). The parser needs a "Short-Term Memory" to recall the subject or verb from the previous clause when it is omitted in the current one.
Here is the architectural pattern to handle Noun1 Verb Noun2 AND Verb Noun3.
1. The Strategy: Split and Inherit
We treat the conjunction ("AND", "THEN") as a hard delimiter that breaks the stream into Segments. We then process these segments sequentially. If a segment is incomplete (missing a Subject), it inherits it from the previous valid command.
The Logic Flow:
Normalize: Replace all delimiters (
,,and,then) with a unique separator|.Split: Break the string into a list of segments.
Iterate: Parse each segment.
Inherit:
If Segment B has no Subject, use Subject from Segment A.
If Segment B has no Subject AND no Verb, use both from Segment A (e.g., "Open Valve A and Valve B").
2. Python Implementation
We will wrap the DSLParser from the previous step in a ChainProcessor.
class ChainProcessor:
def __init__(self, parser):
self.parser = parser
# Context Memory
self.last_subject = None
self.last_verb = None
def process_chain(self, raw_text):
# 1. Normalize delimiters to a single pipe character
# Handle " and ", " then ", or just commas
clean_text = raw_text.replace(" and ", "|").replace(" then ", "|").replace(", ", "|")
segments = clean_text.split("|")
compiled_commands = []
for seg in segments:
# Parse the individual segment using the Logic from the previous step
cmd_obj = self.parser.parse(seg.strip())
# --- CONTEXT RECOVERY LOGIC ---
# 1. Handle Missing Subject (Noun1)
if not cmd_obj["target_class"]:
if self.last_subject:
cmd_obj["target_class"] = self.last_subject
print(f" -> Context: Inheriting Subject '{self.last_subject}'")
else:
print("Error: Ambiguous command. No subject defined.")
continue
else:
# Update context if a new subject is explicitly stated
self.last_subject = cmd_obj["target_class"]
# 2. Handle Missing Verb (Optional, for "Open Valve A and Valve B")
if not cmd_obj["action"]:
if self.last_verb:
cmd_obj["action"] = self.last_verb
print(f" -> Context: Inheriting Verb '{self.last_verb}'")
else:
print("Error: Ambiguous command. No verb defined.")
continue
else:
self.last_verb = cmd_obj["action"]
compiled_commands.append(cmd_obj)
return compiled_commands
# --- MOCK USE CASE ---
# Assuming 'parser' is the Lexicon-Driven instance from the previous step
processor = ChainProcessor(bot) # 'bot' is the DSLParser instance
# Scenario: The user specifies the Subject once, then issues multiple commands
input_str = "Pilot opens Valve_A and closes Valve_B"
results = processor.process_chain(input_str)
# Output Execution
for i, cmd in enumerate(results):
print(f"Command {i+1}: {cmd['action']} on {cmd['target_class']}")
3. Handling Different "Ellipsis" Types
There are three common variations of chaining you will encounter.
A. Subject Ellipsis (The Prompt's Request)
Input: "System opens valve and checks pump."
Logic: Segment 2 is
checks pump.Resolution: Parse detects
Verb+Noun. MissingSubject. InjectSystem.
B. Subject & Verb Ellipsis
Input: "System opens Valve A and Valve B."
Logic: Segment 2 is
Valve B.Resolution: Parse detects only
Noun. InjectSystem(Subject) andopens(Verb).
C. Subject Switching (Reset Context)
Input: "Pilot opens valve and Alarm sounds."
Logic: Segment 2 is
Alarm sounds.Resolution: Parse detects new
Noun(Alarm) andVerb. Updatelast_subjecttoAlarm.
4. Graph Database Implication (Transactional Scope)
When you move to Knowledge Graphs (Cypher/Neo4j), chained commands usually imply a Transaction. You want all actions in the chain to succeed, or none of them.
If you generate Cypher from this:
// Command 1
MATCH (s:Person {id: 'Pilot'}), (v1:Valve {id: 'Valve_A'})
MERGE (s)-[:OPENS]->(v1)
// Command 2 (Chained)
WITH s // Carry over the subject context
MATCH (v2:Valve {id: 'Valve_B'})
MERGE (s)-[:CLOSES]->(v2)
5. Advanced: The "Result" Chain
Sometimes the input of the second command is the output of the first.
Input: "Find leaking valves and close them."
Pattern:
Find(Criteria)$\rightarrow$List<Valve>$\rightarrow$Close(List)
This requires a slightly different approach where Noun2 in the second command is a reserved keyword (like "them", "it", "results").
if cmd_obj["target_class"] in ["them", "it", "those"]:
# The target is not a text string, but the return value of the previous function
cmd_obj["target_ptr"] = "PREVIOUS_RESULT"
Summary of Rules
| Pattern | Missing Element | Action |
| Noun V N AND V N | Subject | Inherit Subject |
| Noun V N AND N | Subject & Verb | Inherit Subject & Verb |
| Noun V N AND N V N | None | Update Subject & Verb (New Context) |
Would you like to see the Python code for the "Result Chain" (handling "them/it") logic, or move to how this integrates with your NASA standards database project?
From <https://gemini.google.com/app/> Google Gemini 3.0 Pro (Thinking)
Comments
Post a Comment