Nvidia Uses Emulation To Boost FP64 Performance
Nvidia unveils Rubin GPUs and promotes FP64 emulation in CUDA libraries, claiming up to 200 teraFLOPS of FP64 matrix performance versus 33 teraFLOPS native. AMD cautions that emulation (Ozaki scheme) can break down on ill-conditioned or vector-heavy scientific simulations and is not fully IEEE-compliant. The debate suggests HPC practitioners must validate emulation accuracy and memory trade-offs on target workloads.
Key Points
- 1Demonstrates FP64 emulation delivering up to 200 TFLOPS matrix performance on Rubin GPUs
- 2Raises concerns about IEEE compliance and numerical robustness in ill-conditioned, vector-heavy scientific simulations
- 3Suggests practitioners should benchmark emulation on target workloads before replacing hardware FP64
Scoring Rationale
Demonstrated high FP64 throughput and vendor validation, limited by IEEE noncompliance concerns and unproven real-world robustness.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
