#4298 LLM Performance Nvidia DGX Spark
#4298 LLM Performance Nvidia DGX Spark #4298 Describe the performance of Nemotron 3 Super on an Nvidia DGX Spark Running the NVIDIA Nemotron 3 Super (120B-A12B) on the DGX Spark is a "sweet spot" configuration for local AI development. Because both the model and the hardware were released in late 2025/early 2026 with a focus on FP4 precision , they are architecturally aligned to punch well above their weight class. Here is the performance breakdown for this specific combination: System Performance Overview The DGX Spark is a "desktop supercomputer" powered by the GB10 Grace Blackwell Superchip . When running Nemotron 3 Super in its native NVFP4 (4-bit) format, the performance is optimized for high-throughput reasoning. Metric Performance on DGX Spark (Single Node) Prompt Processing ~2,855 tokens/sec Token Generation ~18–25 tokens/sec (Typical) Time to First Token ~33.4 seconds (128K context) VRAM Usage ~80–90 GB (leaving ~40 GB for KV Cache/Context) Why This Pa...