#4485 MSI EdgeXpert Setup

#4485 Tell me about MSI EdgeXpert

The MSI EdgeXpert (MS-C931) is a compact, ultra-dense desktop AI supercomputer built on the NVIDIA DGX™ Spark platform. It is designed specifically for AI developers, researchers, and enterprises who need to run heavy machine learning workloads, fine-tuning, and massive large language models (LLMs) completely locally and securely on premise.

Rather than being a traditional PC, it is effectively a data-center-class hardware and software stack shrunk down into a tiny, 1.19-liter mini-PC form factor.

Here is a breakdown of its core specifications, performance capabilities, and architectural features:

1. Underlying Architecture & Hardware

The Processor: It is powered by the NVIDIA® GB10 Grace Blackwell Superchip. This integrates a 20-core ARM CPU (10x Cortex-X925 + 10x Cortex-A725) directly with a Blackwell-architecture GPU.
Memory Structure: It features 128 GB of LPDDR5x coherent, unified system memory operating on a 256-bit interface with 273 GB/s of bandwidth. Because the CPU and GPU are linked via NVIDIA’s high-speed NVLink-C2C interconnect, they share this unified memory pool seamlessly with up to 5x the bandwidth of standard PCIe 5.0.
Compute Power: It delivers 1 petaFLOP of AI Tensor performance (specifically 1,000 AI TOPS using FP4 sparse precision).
Physical Form Factor: It packs all of this into an incredibly small footprint—measuring just 151 mm × 151 mm × 52 mm and weighing only 1.2 kg (2.65 lbs). Despite the dense power, real-world user reports note that its thermal management keeps it quiet and cool under prolonged peak loads.

2. Capabilities & LLM Scaling

The massive 128GB unified memory pool removes the VRAM bottlenecks typical of consumer hardware, making it a specialized beast for local, private inferencing:

Single-Unit Local Processing: Out of the box, a single EdgeXpert can run advanced local models up to 200 billion parameters (such as DeepSeek, Llama, or Qwen variants) at highly usable token-per-second rates.
Multi-Unit Stacking: Thanks to the inclusion of an onboard NVIDIA ConnectX-7 Smart NIC (10 GbE), you can link two units together to achieve a dual-system configuration capable of handling models up to 405 billion parameters locally.

3. Software Ecosystem & Use Cases

The EdgeXpert runs NVIDIA DGX™ OS, meaning it skips the overhead of consumer operating systems and streamlines workflows for CUDA-accelerated libraries, PyTorch, TensorRT, and TensorFlow.

Because it gives developers full access to the enterprise NVIDIA AI software stack, it acts as a local sandbox for deploying agentic workflows, computer vision, and industrial automation via specialized NVIDIA frameworks:

NVIDIA Isaac & Metropolis: For edge robotics, smart spaces, and vision-language intelligence.
Secure Agentic AI: It is heavily marketed toward regulated environments (finance, legal, defense, and specialized engineering sectors) where data privacy is paramount. It allows organizations to build local RAG (Retrieval-Augmented Generation) frameworks, parse internal compliance/policy documents, and run complex multi-agent orchestration without data ever leaving the physical device.

4. Pricing and Availability

Because it is enterprise-focused industrial hardware rather than a standard consumer desktop, pricing isn't heavily advertised on standard retail shelves, though it sits around $4,500 to $5,300 USD (roughly AUD 7,000–8,000) depending on whether you opt for the 1TB or 4TB self-encrypting Gen5 NVMe storage configuration.

Ultimately, it acts as a bridge: it lets a developer prototype, test, and run heavy local AI agents right at their desk, with the exact same software environment they would use to scale seamlessly up to a full NVIDIA DGX cloud or data center later.

I want to run Ollama via Jupyter Python notebooks. What are first steps?

Running Ollama inside a Jupyter Notebook environment is an excellent workflow for prototyping AI agents, processing text pipelines, and testing models locally.

Because Ollama runs as a background service separate from your Python environment, the integration is incredibly clean. Here are the step-by-step first steps to get up and running.

Step 1: Verify Ollama is Running & Pull a Model

Before touching Python, ensure your local Ollama instance is active and has a model ready to go.

open your terminal or command prompt.
Check your active models:
Bash
ollama list
If you don't have a lightweight model downloaded yet for testing, pull one (for quick testing, llama3.2 or qwen2.5 variants work beautifully):
Bash
ollama pull llama3.2

Step 2: Prepare Your Python Environment

In your terminal, activate the environment where your Jupyter kernel lives, and install the official Ollama Python SDK.

pip install ollama jupyter

(If you prefer tracking dependencies via standard packages, just add ollama to your requirements.txt or environment configuration).

Step 3: Open Jupyter and Run Your First Cell

Launch your notebook environment (jupyter notebook or jupyter lab), open a new .ipynb file, and test the connection using the standard completion API.

1. Basic Generation (Non-streaming)

import ollama

response = ollama.generate(
    model='llama3.2',
    prompt='Give me a concise, one-sentence definition of a graph database.'
)

print(response['response'])

2. Streaming Responses in Real-Time

For longer generations, waiting for the whole payload can feel sluggish. You can leverage Python iterators to stream the output token-by-token directly into the notebook cell output:

import ollama

stream = ollama.generate(
    model='llama3.2',
    prompt='Explain the core concept of first-principles thinking.',
    stream=True
)

for chunk in stream:
    print(chunk['response'], end='', flush=True)

Step 4: Structuring a Multi-Turn Conversation

If you are planning to build interactive loops or maintaining chat history, switch to the ollama.chat API. You will need to explicitly manage the payload arrays:

import ollama

# Initialize context history
messages = [
    {
        'role': 'system', 
        'content': 'You are a precise technical assistant who communicates using concise engineering terms.'
    },
    {
        'role': 'user', 
        'content': 'What is the main benefit of keeping AI execution local?'
    }
]

# First turn
response = ollama.chat(model='llama3.2', messages=messages)
print("Assistant:", response['message']['content'])

# Append assistant's answer to keep the state tracking accurate
messages.append(response['message'])

# Second turn
messages.append({'role': 'user', 'content': 'Are there memory bottlenecks with this approach?'})
follow_up_response = ollama.chat(model='llama3.2', messages=messages)

print("\nAssistant:", follow_up_response['message']['content'])

Advanced Tip: Connecting to a Specific Engine/Host

By default, importing ollama looks for the server running locally at http://localhost:11434.

If you ever shift your compute heavy-lifting to a dedicated local machine or mini-PC cluster on your local network, you don't need to change your code structure. You simply instantiate an explicit client targeting that machine's IP:

from ollama import Client

# Target a dedicated local machine or custom port on your LAN
client = Client(host='http://119.168.1.50:11434')

response = client.generate(model='llama3.2', prompt='Hello from the notebook!')
print(response['response'])

How about running Neo4j Desktop on this machine?

Running Neo4j Desktop directly on the MSI EdgeXpert requires shifting gears.

Because the EdgeXpert runs NVIDIA DGX OS (an enterprise Linux distribution) on an ARM64 (Grace) processor architecture, you cannot simply download and run the standard Neo4j Desktop .AppImage or Debian/Ubuntu .deb desktop packages. Neo4j explicitly restricts its bundled "Desktop" GUI app to x86_64 architectures for Linux.

However, the hardware itself—with 128GB of coherent unified memory and blazing fast NVMe I/O—is an absolute powerhouse for massive graph workloads and GraphRAG pipelines.

To get the exact same functionality, the standard engineering path is to run the Neo4j Enterprise/Community Server natively (or via Docker) on the EdgeXpert, and access the rich graphical Neo4j Browser or Neo4j Bloom interfaces through your web browser or your local Jupyter Notebook.

Here are the two best ways to set this up out of the box.

Method 1: The Docker Path (Cleanest & Most Isolated)

Since DGX OS comes ready for containerized workloads, spinning up Neo4j in Docker ensures you don't run into any native ARM64 dependency mismatches. The official Neo4j Docker images have built-in multi-arch support for arm64/v8.

Fire up a terminal and run the container:

docker run \
    --name neo4j-edge \
    -p 7474:7474 -p 7687:7687 \
    -d \
    -v $HOME/neo4j/data:/data \
    -v $HOME/neo4j/logs:/logs \
    -v $HOME/neo4j/import:/import \
    -v $HOME/neo4j/plugins:/plugins \
    --env NEO4J_AUTH=neo4j/YourSecurePasswordHere \
    --env NEO4J_PLUGINS='["apoc"]' \
    neo4j:latest

(Note: Including the NEO4J_PLUGINS environment variable automatically pulls the APOC library, which is critical for advanced graph refactoring and data conversion workflows).

Access the Interface:
Open any browser on your network and navigate to http://localhost:7474 (or target the EdgeXpert’s LAN IP). You will get the full Neo4j Browser GUI for writing Cypher queries and visualizing nodes.

Method 2: The Native Apt Repository Path

If you prefer running Neo4j as a persistent system service without Docker overhead, you can install the ARM64 server binaries directly via Neo4j’s official Debian repository.

Add the Neo4j GPG key and repository source:

wget -O - https://debian.neo4j.com/neotechnology.gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/neo4j.gpg
echo 'deb [signed-by=/usr/share/keyrings/neo4j.gpg] https://debian.neo4j.com stable latest' | sudo tee -a /etc/apt/sources.list.d/neo4j.list
sudo apt-get update

Install the Database Server:
sudo apt-get install neo4j -y
Enable External LAN Access (Optional):
If you want to query this graph from a secondary device or laptop, modify the config file:
sudo nano /etc/neo4j/neo4j.conf
Uncomment or modify the line:
server.default_listen_address=0.0.0.0
Save the file and restart the service:
sudo systemctl enable neo4j sudo systemctl start neo4j

Connecting Jupyter to Neo4j & Ollama Together

Once your graph server is active, you can weave Ollama and Neo4j together inside your Python environment. Install the official Neo4j driver alongside your notebook packages:

pip install neo4j

Inside your Jupyter Notebook, you can verify the end-to-end connection loop—using Ollama to structure raw data and sending it cleanly to your local graph instance:

import ollama
from neo4j import GraphDatabase

# 1. Extract a relationship structured by local LLM
prompt = "Extract the primary subject, predicate, and object from: 'The EdgeXpert runs DGX OS.'"
response = ollama.generate(model='llama3.2', prompt=prompt)
print("LLM Parsing Output:", response['response'])

# 2. Direct handshake verification with the local Graph Server
URI = "bolt://localhost:7687"
AUTH = ("neo4j", "YourSecurePasswordHere")

with GraphDatabase.driver(URI, auth=AUTH) as driver:
    driver.verify_connectivity()
    
    # Simple Cypher execution to test storage
    with driver.session() as session:
        session.run("""
            MERGE (h:Hardware {name: 'EdgeXpert'})
            MERGE (o:OS {name: 'DGX OS'})
            MERGE (h)-[:RUNS]->(o)
        """)
        
print("\nGraph connection active. Node and Relationship verification successful.")

From <https://gemini.google.com/app/> Google Gemini (3.5 Flash)

Search This Blog

Learning from ChatGPT