Huawei Enables DeepSeek V4 on Ascend Supernode Clusters

Huawei announced that its Ascend supernode built on Ascend 950 AI chips will fully support DeepSeek's V4 preview. DeepSeek's V4 includes a pro variant that, the company says, outperforms other open-source models on world-knowledge benchmarks and a lower-cost flash version intended for broader deployment. The collaboration signals a shift from DeepSeek's earlier reliance on Nvidia hardware toward tighter integration with Chinese chip vendors, reinforcing onshore AI stack development amid heightened US-China tensions over exports and alleged IP issues. For practitioners, this means new hardware-targeted optimization and benchmarking work, potential changes in deployment pipelines, and closer attention to compatibility with Huawei's compute and software stack.
What happened
Huawei said its Ascend supernode, based on the Ascend 950 AI chip, will fully support DeepSeek's V4 model after the startup released a public preview. DeepSeek positioned the V4 pro variant ahead of other open-source models on world-knowledge benchmarks, trailing only Google's Gemini-Pro-3.1. The preview also includes a lower-cost flash version.
Technical details
DeepSeek adapted V4 for Huawei chip technology, marking a visible move from prior work on Nvidia chips to an Ascend-targeted stack. Key practitioner takeaways:
- •V4 comes in at least two builds: a higher-capacity pro variant and a lower-cost flash variant for constrained deployments.
- •Huawei commits the Ascend supernode environment, powered by Ascend 950, as a supported inference target; expect vendor tooling and runtime optimizations to follow.
- •DeepSeek did not disclose exact training hardware or full performance telemetry, so independent benchmarking on Ascend 950 hardware will be necessary to validate claims.
Context and significance
This announcement matters on three fronts. First, it accelerates China-centric AI stack maturity by linking a leading domestic model to a domestic inference platform. Second, it reduces reliance on Nvidia hardware and associated export constraints, which have been a focal point of recent US-China friction and allegations against DeepSeek. Third, it signals a pragmatic path for model vendors to ship regionally optimized binaries and runtime support to gain performance and cost advantages.
For engineers, several practical implications follow. Expect work to port and optimize kernels, memory layout, and mixed-precision behavior for the Ascend runtime rather than CUDA. Tooling differences, such as vendor-provided compilers, graph optimizers, and operator libraries, will require integration with CI pipelines. Benchmarking should compare latency, throughput, and cost-per-token between V4 on Ascend 950 and equivalent runs on Nvidia hardware.
What to watch
The two immediate questions are when DeepSeek releases final V4 binaries and detailed benchmarks, and how broadly Huawei will make Ascend capacity available to cloud and enterprise customers for reproducible testing. Also monitor regulatory scrutiny and whether export-control and IP allegations affect cross-border partnerships or access to tooling and chips.
Why it matters for practitioners
Hardware-targeted model releases shift where optimization effort must go. If V4 gains traction on Ascend nodes, expect new performance baselines, different inference-cost profiles, and an expanded ecosystem of Ascend-optimized libraries and deployment recipes. Organizations deploying or benchmarking open models should add Ascend environments to test matrices and evaluate porting effort against the expected deployment benefits.
Scoring Rationale
This is a notable infrastructure-model alignment: it advances China's onshore AI stack and matters for practitioners optimizing deployment and benchmarking. It is not paradigm-shifting but affects vendor strategy and operational workstreams.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

