Cisco and AMD Benchmark Scale-out AI Fabric Performance

According to a Cisco blog post, Cisco benchmarked a scale-out AI fabric built from Cisco N9000 Series switches, AMD Instinct MI300X GPUs, and AMD Pensando Pollara 400 NICs. Cisco reports testing two Clos topologies (2x2 and 4x2) using Cisco N9364E-SG2 switches with 51.2 Tbps throughput and 64 800 GbE ports, 128 GPUs (16 servers x 8 GPUs), 128 Pollara 400 NICs, and Cisco 800G OSFP optics. The tests used the AMD ROCm software stack and Cisco Nexus Dashboard for visibility. Cisco frames the results as reducing job completion time and improving GPU utilization by addressing network-induced stalls. Editorial analysis: for practitioners, the report underscores that fabric bandwidth, congestion resilience, and telemetry are central variables when scaling multi-GPU training clusters.
What happened
According to a Cisco blog post, Cisco and AMD benchmarked a validated scale-out AI fabric built from Cisco N9000 Series switches, AMD Instinct MI300X GPUs, and AMD Pensando Pollara 400 NICs. Per Cisco, the reference testbed used Cisco N9364E-SG2 switches with 51.2 Tbps throughput and 64 800 GbE ports, 128 GPUs (16 servers x 8 GPUs), 128 Pollara 400 NICs, and Cisco 800G OSFP optics. Cisco reports benchmarking two Clos topologies, labeled 2x2 and 4x2, and used the ROCm software ecosystem and Cisco Nexus Dashboard for operational visibility.
Technical details
The 2x2 testbed intentionally pushed switch-level congestion to evaluate fabric resilience, according to Cisco. The hardware stack called out in the post includes 800G optics and 800 GbE ports on the leaf/spine switches, and eight NICs per server. Cisco frames the workload focus as reducing job completion time (JCT) and avoiding GPU stalls caused by network bursts.
Editorial analysis
Industry context: public reporting on scale-out training increasingly highlights the network as the gating factor for GPU utilization, especially in topologies where thousands of accelerators must synchronize. Companies deploying comparable multi-server, multi-GPU clusters typically prioritize high-throughput switching, low-latency AI NICs, and richer telemetry to detect microbursts and retransmissions.
What to watch
Observers should track independent benchmark data or MLPerf entries that corroborate vendor-reported JCT and utilization figures, the adoption pace of 800G optics in AI clusters, and tooling integration for per-iteration telemetry and congestion mitigation.
Scoring Rationale
Vendor-validated benchmarking of full-stack AI fabrics is notable for infrastructure teams planning large-scale training clusters. The report highlights network bottlenecks and 800G hardware, which matters to practitioners but does not on its own change models or tooling.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


