#4517 KG for Music Theory, Production Music Libraries, MP3 Samples and MIDI Controller

#4517 Combine music theory, production music libraries, mp3 sound and music samples, a knowledge graph relating all, and a midi controller

This is the blueprint for the ultimate Intelligent Sonic Workbench. By anchoring these disparate worlds—abstract theory, raw audio assets, structural metadata, and tactile hardware—into a centralized Knowledge Graph (KG), you transform a passive sample library into an active, context-aware co-composer.

Here is how these five pillars interconnect to form a unified, graph-driven production ecosystem.

1. The Core Graph Architecture (The Connective Tissue)

Instead of relying on rigid folder structures or basic ID3 tags, a graph database (like Neo4j) treats music theory, audio files, and hardware interactions as interconnected nodes.

[MIDI Controller Key] -- (Triggers) --> [Theory: Chord/Scale]
                                                 |
                                      (Characterizes)
                                                 v
[MP3 Sample/Stem] <-- (Belongs To) --- [Production Track]

The Node Taxonomy

Theory Nodes: Scale, Chord, Interval, Mode, Cadence, Emotion/Mood.
Library Nodes: Track, Album, Composer, Publisher, PRO (ASCAP/BMI), Sync License Tier.
Audio Nodes: MP3 Sample, One-Shot, Stem, Loop, Format (Bitrate/Sample Rate).
Hardware Nodes: MIDI Note, CC_Channel, Pad, Fader, Encoder.

2. Cross-Domain Mapping: Where the Pillars Meet

A. Music Theory <--> MP3 Audio Samples

Instead of searching for "Sad Piano in A minor," the graph maps the underlying relationships of the audio file.

The Bridge: An MP3 sample node contains properties for Fundamental Frequency ( $f_0$ ), BPM, and Key.
Graph Logic: If you select an A minor audio sample, the graph instantly knows its Relative Major (C major), its Parallel Major (A major), and its Subdominant (D minor).
The Result: The system can automatically query and suggest MP3 loops or stems that are harmonic matches, even if they aren't labeled as such in your folder structure.

B. Production Music Libraries <--> Theory & Mood

Production music thrives on utility, tagging, and rapid retrieval for sync licensing (e.g., "Tense, cinematic build-up").

The Bridge: Connect structural music theory choices to production library tags.
Graph Logic: Map the Epic/Heroic mood tag to the Dorian mode or a {I} - {bVII} - {IV} chord progression.
The Result: When browsing a production library for a specific brief, the graph flags tracks that utilize the exact modal changes known to evoke that specific cinematic emotion.

C. The MIDI Controller <--> The Entire Network

The MIDI controller ceases to be just a note trigger; it becomes a physical portal to navigate the graph.

Dynamic Remapping: Pressing a pad on your controller sends a MIDI Note/CC. The graph intercepts this and translates it based on the active node context.
Chord Mode Generation: Hit a single key on your controller --> The graph identifies the current track's scale --> It calculates the mathematically correct diatonic chord $\rightarrow$ It fires the corresponding MIDI cluster to your hardware synth or DAW, while simultaneously highlighting matching audio loops in that key.
Tactile Browsing: Assign an encoder to traverse the graph itself. Turning a knob moves from a C minor node to its G minor (Dominant) node, instantly updating your controller’s LED feedback and queuing up relevant MP3 stems.

3. Practical Use Case: The "Intelligent Sync-Brief" Workflow

Imagine you receive a production music brief: "Need a dark, sci-fi driving electronic track at 120 BPM."

Graph Query: You select the tags "Dark" and "Sci-Fi" via your MIDI controller.
Theory Activation: The graph traverses edges connecting these moods to musical theory, activating the Phrygian Dominant mode and a preference for minor 2nd intervals.
Asset Retrieval: The graph filters your Production Library and MP3 Sample collection, returning only assets that are tagged with 120 BPM (or warpable multiples) and contain Phrygian characteristics.
Hardware Configuration: Your MIDI Controller splits automatically:
- The left 8 pads are mapped to trigger the top 8 matching MP3 drone samples found in the database.
- The right 8 pads are locked into a Phrygian scale quantizer, ensuring any note you play live instantly matches the theory and the samples perfectly.

4. Conceptual Cypher Schema (Neo4j)

Here is a glimpse of how this looks under the hood when linking a physical MIDI input to a theory concept, which then points to a usable audio asset:

// Create a Music Theory Node
CREATE (aminor:Scale {name: "A Natural Minor", type: "Aeolian"})

// Create an MP3 Sample Node
CREATE (bassLoop:Sample {
    filename: "synth_bass_120_Am.mp3", 
    bpm: 120, 
    key: "Am", 
    filepath: "/samples/loops/synth_bass_120_Am.mp3"
})

// Create a Hardware Mapping
CREATE (pad1:MidiControl {note_number: 36, control_type: "Pad"})

// Relate them all together
CREATE (bassLoop)-[:IN_SCALE]->(aminor)
CREATE (pad1)-[:TRIGGERS_TONAL_CENTER]->(aminor)

By linking these components, you build an ecosystem where human intuition and hardware performance are amplified by a structurally intelligent backend.

How are you planning to orchestrate this system—are you looking to write custom scripts to pipe MIDI data into a graph, or are you focused on how to best ingest and tag the production libraries first?

Knowledge graph creation and knowledge ingestion

Building the knowledge graph and ingesting these diverse data streams requires a decoupled, pipelined architecture. Because you are dealing with structured metadata (libraries, MIDI maps), unstructured data (music theory definitions, sonic descriptions), and binary analysis (MP3 audio features), your ingestion engine needs to handle three distinct vector and symbolic pipelines.

Here is the blueprint for designing the ingestion pipeline and the underlying graph schema to bind these worlds together.

1. The Target Schema (Graph Topology)

To achieve lightning-fast lookups during a live performance or production session, we use a hybrid schema that supports both deterministic relational links (e.g., $C \rightarrow G$ is a Perfect 5th) and semantic vector properties (e.g., "haunting cinematic atmosphere").

             [:MAPS_TO]                             [:IN_KEY]
  [MidiMap] -------------> [TheoryNode] <------------- [AudioFile]
                                               |                                       |
                     [:HAS_SUB_ELEMENT]             [:BELONGS_TO]
                                               v                                      v
                                     [Chord/Mode]                 [LibraryAsset]

Node Labels & Core Properties

:Theory
- id: "scale_a_minor", "chord_c_maj7", "interval_p5"
- name: "A Minor", "C Major 7"
- pitch_class_set: [0, 2, 3, 5, 7, 8, 10] (for real-time MIDI matching)
:AudioFile
- id: Unique hash of the file.
- filepath: Absolute local path to the MP3.
- bpm: Integer.
- key_signature: "Am", "F#m"
- embedding: Array<Float> (Vector embedding of sonic characteristics)
:LibraryAsset
- id: Vendor tracking ID.
- catalog: "Splice", "Native Instruments", "Custom_2026"
- license_type: "Royalty-Free", "Sync-Restricted"
:MidiControl
- hardware_id: "tracker_kbd_pad_1", "stream_dock_b1"
- channel: Integer (1-16)
- control_type: "Note", "CC", "ProgramChange"
- trigger_value: Integer (0-127)

2. The Ingestion Pipeline Architecture

The ingestion framework runs locally to maintain data privacy and performance, processing files through an asynchronous Python pipeline (Librosa + Ollama/Local LLM + Neo4j).

[Raw Assets] 
   |
   +---> [Python Audio Engine] ---> Extracted Features (BPM, Key) --+
   |                                                                                                         |
   +---> [Local LLM / Regex] ---> Semantic Metadata (Mood, Tags) --+--> [Neo4j Batch Write]
   |                                                                                                         |
   +---> [JSON/CSV Configs]  ----> Theory Rules & MIDI Hardware --+

Step 1: The Audio Analysis Ingestion (Extracting Features)

For raw MP3 loops and sound samples, we can automate feature extraction using Python rather than relying entirely on manual tagging.

import librosa
import numpy as np

def analyze_mp3(file_path):
    # Load audio (downsampled for faster processing)
    y, sr = librosa.load(file_path, sr=22050, duration=30.0)
    
    # 1. Extract Tempo (BPM)
    tempo, _ = librosa.beat.beat_track(y=y, sr=sr)
    
    # 2. Extract Key/Chromagram (Simplified)
    chroma = librosa.feature.chroma_cqt(y=y, sr=sr)
    mean_chroma = np.mean(chroma, axis=1)
    
    # Return structured dict for graph properties
    return {
        "bpm": int(np.round(tempo)[0] if isinstance(tempo, np.ndarray) else np.round(tempo)),
        "chroma_profile": mean_chroma.tolist()
    }

Step 2: Semantic Metadata Extraction via Local LLM

For production libraries that come with rich descriptions or text files, we route the text through a local LLM to extract clean, standardized tags for the graph.

Input Context: "A heavy, menacing industrial synth bass line with an aggressive distortion, perfect for cyberpunk tracking."
LLM Extraction Prompt: "Extract the primary instrument, mood, genre, and timbral descriptors as a JSON object."
JSON Payload Output: ```json
{"instrument": "Synth Bass", "moods": ["heavy", "menacing", "aggressive"], "genre": "Cyberpunk"}```

Step 3: Pure Theory Seed Ingestion

Music theory rules don't change, so they are ingested once using a structured seeding script. Circle of Fifths relationships, chord formulas (e.g., Minor 7th = [0, 3, 7, 10]), and modal rotations are calculated mathematically and pushed as the baseline structural lattice of the graph.

3. Orchestrating the Ingestion (Cypher Execution)

To prevent database choking when processing thousands of MP3s, ingestion should happen in vectorized batches using UNWIND. Here is the core operational Cypher query for writing an ingested audio file and linking it to the theory node and library catalog simultaneously:

// Ingest a batch of analyzed sample metadata
UNWIND $batch AS row

MERGE (audio:AudioFile { id: row.file_hash })
SET audio.filepath = row.filepath,
    audio.bpm = row.bpm,
    audio.sonic_embedding = row.embedding,
    audio.timbre_tags = row.tags

// Ensure the Library Catalog exists and connect it
MERGE (lib:LibraryAsset { name: row.catalog_name })
MERGE (audio)-[:BELONGS_TO]->(lib)

// Find the pre-seeded Music Theory scale node and link the audio asset to it
WITH audio, row
MATCH (scale:Theory { id: row.calculated_scale_id })
MERGE (audio)-[:IN_KEY]->(scale)

4. Hooking Up the MIDI Interface Engine

Once the graph is populated, your MIDI controller acts as a dynamic graph query interface. Using a lightweight local script (mido in Python), incoming MIDI inputs trigger immediate lookups:

import mido
from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))

def on_midi_message(msg):
    if msg.type == 'note_on':
        # Query the graph to see what sample and scale match this MIDI note
        query = """
        MATCH (m:MidiControl {trigger_value: $note})-[r:MAPS_TO]->(t:Theory)
        MATCH (sample:AudioFile)-[:IN_KEY]->(t)
        WHERE sample.bpm > 110 AND sample.bpm < 130
        RETURN sample.filepath LIMIT 3
        """
        with driver.session() as session:
            results = session.run(query, note=msg.note)
            for record in results:
                print(f"Dynamically Queueing Match: {record['sample.filepath']}")

# Open the port for your hardware controller (e.g., Tracker Workstation or Macro Pad)
inport = mido.open_input('Your_Controller_Name')
inport.callback = on_midi_message

How do you want to handle the ingestion layout? Are your production music libraries currently sitting in flat files/folders with CSV logs, or are you building the music theory lattice from scratch first?

From <https://gemini.google.com/app/> Google Gemini (3.5 Flash)

Search This Blog

Learning from ChatGPT