AMD Debuts MI350P PCIe Instinct Accelerator

Reporting by The Register and Wccftech describes a new AMD PCIe accelerator, the MI350P, introduced on May 7, 2026. The card is a dual-slot, air-cooled design and is reported to deliver up to 4.6 petaFLOPS of FP4 compute, include 144 GB of HBM3E across four stacks, and consume 600W of power (The Register; Wccftech). Wccftech reports the GPU uses CDNA 4 silicon in a half MI350X configuration with 128 compute units (~8,192 stream processors) and a single IO die. The Register notes the card lacks high-speed chip-to-chip interconnects and will be limited to PCIe 5.0 bandwidth for multi-card communication. A Dell blog highlights server integrations with Dell PowerEdge platforms running the new card.
What happened
AMD rollout coverage by The Register and Wccftech documents the new PCIe accelerator MI350P, AMD's first slottable Instinct card since the MI210 in 2022. Reporting by The Register and Wccftech lists the card as a dual-slot, air-cooled design consuming 600W, offering up to 4.6 petaFLOPS of FP4 compute, and carrying 144 GB of HBM3E across four stacks with roughly 4 TB/s of memory bandwidth (The Register; Wccftech). Wccftech reports the die is a half MI350X configuration with 128 compute units (about 8,192 stream processors), 512 matrix cores, a 2.2 GHz peak clock, and about 73 billion transistors; the IO die is reported as a separate 6nm component (Wccftech). The Register reports AMD supports one-to-eight MI350P configurations but that the cards lack a high-speed on-card fabric and therefore rely on PCIe 5.0 (128 GB/s) for inter-card traffic (The Register).
Technical details
Editorial analysis - technical context
Wccftech attributes the GPU to the CDNA 4 architecture fabricated in a multi-die TSMC flow, described as a 4 XCD configuration that is effectively half of the full MI350X device. Reporting lists native support for lower-precision formats labelled MXFP6 and MXFP4 and sparsity acceleration for mainstream 8- and 16-bit precisions, which the outlets frame as target features for AI inference and mixed-precision training workloads (Wccftech).
Industry context
Editorial analysis
Public coverage by The Register compares the MI350P to contemporary PCIe Blackwell and Hopper-class cards from Nvidia, noting that on paper the MI350P offers higher peak FP8 figures versus Nvidia's H200 and edge-case VRAM advantages versus some RTX Pro PCIe parts. The Register additionally highlights that Nvidia's PCIe offerings retain an edge on memory bandwidth and, in many configurations, on-chip networking via NVLink - a capability the MI350P is reported not to include (The Register).
What this means for deployment
For practitioners
The PCIe form factor and air-cooled design are framed by coverage as lowering the barrier to on-premises AI adoption because the card can fit standard 19-inch servers rather than requiring OAM or custom chassis (The Register; Dell blog). Reporting by Dell positions PowerEdge servers as early integration partners for the card, and Dell's blog post outlines using MI350P in on-prem generative and agentic AI workloads (Dell). However, The Register cautions that multi-card scaling will be constrained by PCIe 5.0 interconnect bandwidth in the absence of an on-card high-speed fabric, which matters for training large models that rely on high-bandwidth chip-to-chip links (The Register).
What to watch
For practitioners
Observers and infrastructure teams will want to track three items reported as open: pricing and availability (no price disclosed in coverage), Dell PowerEdge configurations and integration details (Dell blog), and performance in multi-card, multi-node training where the reported lack of NVLink-equivalent interconnects could limit throughput (The Register; Wccftech). Reported peak numbers such as 4.6 petaFLOPS and 144 GB of HBM3E establish the card as a potentially attractive option for inference-heavy and medium-scale training deployments that prioritize standard server compatibility over maximal multi-card scaling (The Register; Wccftech).
Summary takeaway
Editorial analysis
The reported arrival of a high-memory, slottable CDNA 4 card represents a pragmatic industry move to broaden on-premise options for enterprises that cannot or do not want to adopt OAM-only or NVL packaging. How compelling the MI350P will be for specific workloads depends on price, availability, and whether system integrators and customers accept the reported PCIe 5.0 scaling tradeoffs compared with accelerator cards that include on-card high-speed interconnects (The Register; Wccftech; Dell).
Scoring Rationale
A high-memory, slottable datacenter GPU is a notable infrastructure release that lowers on-prem adoption barriers. The impact hinges on pricing, availability, and multi-card scaling limits, so it is important but not paradigm-shifting.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


