NVIDIA delivers Vera CPU systems to top AI labs

NVIDIA began shipping its first Vera CPU systems to major AI organisations the week of May 18, 2026. According to NVIDIA's newsroom and blog, initial deliveries went to Anthropic, OpenAI, and SpaceXAI on Friday, followed by a delivery to Oracle Cloud Infrastructure on Monday. NVIDIA's newsroom claims agentic inference on the Vera Rubin NVL72 can cut cost per token to one-tenth and reports agent sandboxes run 50% faster and enterprise data queries run up to 3x faster on Vera versus traditional CPUs. Tech reporting from TechBuzz and Seeking Alpha says NVIDIA VP Ian Buck personally hand-delivered early systems to several labs.
What happened
NVIDIA began delivering its first Vera CPU systems to leading AI organisations during the week of May 18, 2026, with initial hand-offs reported to Anthropic, OpenAI, and SpaceXAI, and a subsequent delivery to Oracle Cloud Infrastructure, according to NVIDIA's newsroom and corporate blog. TechBuzz and Seeking Alpha report that NVIDIA Vice President Ian Buck personally delivered early units to some customers.
Technical details
According to NVIDIA's newsroom, the company pairs the Vera Rubin NVL72 CPU with Rubin GPUs and an NVLink-C2C interconnect to target agentic AI workloads, and claims this stack can reduce token inference cost to one-tenth versus unspecified alternatives. NVIDIA's communications also state that agent sandboxes run 50% faster and enterprise data queries run up to 3x faster on Vera compared with traditional CPUs. Independent reporting (wccftech) cites hardware figures for Vera, approximately 88 cores and 1.2 TB/s memory bandwidth, while NVIDIA's marketing materials emphasise low-latency interconnects and memory bandwidth as design priorities.
Editorial analysis - technical context
Companies building CPU hardware for inference or agent orchestration often emphasise memory bandwidth, low-latency PCIe/NVLink topologies, and tight GPU-CPU fabric to reduce end-to-end latency when coordinating multi-step agent actions. Industry-pattern observations: systems that prioritise interconnect throughput over raw floating-point throughput tend to improve responsiveness for workloads dominated by short, frequent data transfers (tool calls, retrieval, context shuttling) rather than long matrix multiplications.
Context and significance
Editorial analysis:
The initial customer list reported by NVIDIA and industry outlets includes organisations operating large multi-component inference stacks and agent frameworks. Industry-pattern observations: early hardware adoption by major platform operators accelerates software and systems integration work (schedulers, memory managers, inter-node communication), and it tends to shape the early reference architectures customers and cloud providers evaluate when designing agentic deployments.
What to watch
For practitioners:
- •Adoption and benchmark transparency: observers should watch for independent performance tests comparing Vera Rubin NVL72 against contemporary server CPUs (AMD EPYC, Intel Xeon) and GPU-attached inference nodes.
- •Integration details: look for documentation on NVLink-C2C behavior, NUMA characteristics, and orchestration tooling that ties Vera CPUs to Rubin GPUs.
- •Cloud availability: Oracle Cloud Infrastructure was named as an early recipient; track when Vera-backed instances appear in public clouds and their pricing, which will determine practical cost trade-offs for agentic workloads.
Bottom line
This delivery round is a first production step for NVIDIA's CPU effort targeted at agentic AI. Reported performance and cost claims come from NVIDIA's own newsroom and blog; early hand-deliveries were covered by TechBuzz and Seeking Alpha. Industry observers and practitioners should treat vendor claims as preliminary until independent benchmarks and cloud offerings provide broader verification.
Scoring Rationale
This is a notable infrastructure milestone: NVIDIA is introducing a purpose-built CPU for agentic workloads and has delivered units to major AI labs. The story matters to practitioners designing low-latency, multi-component inference stacks, though vendor claims still need independent verification.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

