Infrastructureamd instinctmi350pcdna 4hbm3e

AMD Debuts MI350P PCIe Instinct Accelerator

||By LDS Team
7.3
Relevance Score
AMD Debuts MI350P PCIe Instinct Accelerator
Photo: cdn.wccftech.com · rights & takedowns

Reporting by The Register and Wccftech describes a new AMD PCIe accelerator, the MI350P, introduced on May 7, 2026. The card is a dual-slot, air-cooled design and is reported to deliver up to 4.6 petaFLOPS of FP4 compute, include 144 GB of HBM3E across four stacks, and consume 600W of power (The Register; Wccftech). Wccftech reports the GPU uses CDNA 4 silicon in a half MI350X configuration with 128 compute units (~8,192 stream processors) and a single IO die. The Register notes the card lacks high-speed chip-to-chip interconnects and will be limited to PCIe 5.0 bandwidth for multi-card communication. A Dell blog highlights server integrations with Dell PowerEdge platforms running the new card.

What happened

AMD rollout coverage by The Register and Wccftech documents the new PCIe accelerator MI350P, AMD's first slottable Instinct card since the MI210 in 2022. Reporting by The Register and Wccftech lists the card as a dual-slot, air-cooled design consuming 600W, offering up to 4.6 petaFLOPS of FP4 compute, and carrying 144 GB of HBM3E across four stacks with roughly 4 TB/s of memory bandwidth (The Register; Wccftech). Wccftech reports the die is a half MI350X configuration with 128 compute units (about 8,192 stream processors), 512 matrix cores, a 2.2 GHz peak clock, and about 73 billion transistors; the IO die is reported as a separate 6nm component (Wccftech). The Register reports AMD supports one-to-eight MI350P configurations but that the cards lack a high-speed on-card fabric and therefore rely on PCIe 5.0 (128 GB/s) for inter-card traffic (The Register).

Technical details

Editorial analysis - technical context

Wccftech attributes the GPU to the CDNA 4 architecture fabricated in a multi-die TSMC flow, described as a 4 XCD configuration that is effectively half of the full MI350X device. Reporting lists native support for lower-precision formats labelled MXFP6 and MXFP4 and sparsity acceleration for mainstream 8- and 16-bit precisions, which the outlets frame as target features for AI inference and mixed-precision training workloads (Wccftech).

Industry context

Editorial analysis

Public coverage by The Register compares the MI350P to contemporary PCIe Blackwell and Hopper-class cards from Nvidia, noting that on paper the MI350P offers higher peak FP8 figures versus Nvidia's H200 and edge-case VRAM advantages versus some RTX Pro PCIe parts. The Register additionally highlights that Nvidia's PCIe offerings retain an edge on memory bandwidth and, in many configurations, on-chip networking via NVLink - a capability the MI350P is reported not to include (The Register).

What this means for deployment

For practitioners

The PCIe form factor and air-cooled design are framed by coverage as lowering the barrier to on-premises AI adoption because the card can fit standard 19-inch servers rather than requiring OAM or custom chassis (The Register; Dell blog). Reporting by Dell positions PowerEdge servers as early integration partners for the card, and Dell's blog post outlines using MI350P in on-prem generative and agentic AI workloads (Dell). However, The Register cautions that multi-card scaling will be constrained by PCIe 5.0 interconnect bandwidth in the absence of an on-card high-speed fabric, which matters for training large models that rely on high-bandwidth chip-to-chip links (The Register).

What to watch

For practitioners

Observers and infrastructure teams will want to track three items reported as open: pricing and availability (no price disclosed in coverage), Dell PowerEdge configurations and integration details (Dell blog), and performance in multi-card, multi-node training where the reported lack of NVLink-equivalent interconnects could limit throughput (The Register; Wccftech). Reported peak numbers such as 4.6 petaFLOPS and 144 GB of HBM3E establish the card as a potentially attractive option for inference-heavy and medium-scale training deployments that prioritize standard server compatibility over maximal multi-card scaling (The Register; Wccftech).

Summary takeaway

Editorial analysis

The reported arrival of a high-memory, slottable CDNA 4 card represents a pragmatic industry move to broaden on-premise options for enterprises that cannot or do not want to adopt OAM-only or NVL packaging. How compelling the MI350P will be for specific workloads depends on price, availability, and whether system integrators and customers accept the reported PCIe 5.0 scaling tradeoffs compared with accelerator cards that include on-card high-speed interconnects (The Register; Wccftech; Dell).

Key Points

  • 1MI350P is a dual-slot PCIe Instinct reported to deliver 4.6 petaFLOPS FP4 and 144 GB of HBM3E, making it notable for high-memory on-prem deployments (The Register; Wccftech).
  • 2Industry context: Coverage frames the card as a halved MI350X die with CDNA 4 silicon, trading interconnect speed for broad server compatibility (Wccftech; The Register).
  • 3For practitioners: Multi-card scaling is likely to be bounded by PCIe 5.0 bandwidth absent a high-speed fabric, so evaluate on-node and multi-node network topologies before ordering (The Register).

Scoring Rationale

A high-memory, slottable datacenter GPU is a notable infrastructure release that lowers on-prem adoption barriers. The impact hinges on pricing, availability, and multi-card scaling limits, so it is important but not paradigm-shifting.

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems