DeepSeek Undercuts Rivals with V4 Aggressive Pricing

According to reporting by The Decoder and Trending Topics, Chinese AI lab DeepSeek published preview releases of two new open-weight models, V4-Pro and V4-Flash, with up to 1.6 trillion parameters and a one-million-token context window. The Decoder reports the models use a new hybrid attention architecture that cuts long-context compute needs (to 27% FLOPs and 10% KV cache versus V3.2 for V4-Pro), and that training used up to 33 trillion tokens. Oplexa and Automios report DeepSeek is pricing inference far below competitors, with Oplexa citing roughly $3.48 per million output tokens versus $25 for a Claude comparison. Yahoo Finance notes DeepSeek's prior price cuts of up to 75% in 2025. Editorial analysis: this combination of open weights, long-context efficiency, and aggressive pricing shifts the inference economics practitioners must model when choosing stacks.
What happened
According to The Decoder and Trending Topics, Chinese lab DeepSeek released preview-weight versions of two models, V4-Pro and V4-Flash, as open-weights under an MIT-style license. The Decoder reports V4-Pro exposes 1.6 trillion total parameters with 49 billion active parameters and V4-Flash exposes 284 billion total with 13 billion active parameters. Both models are reported as mixture-of-experts designs supporting a one-million-token context window. The Decoder additionally reports the technical paper claims training on up to 33 trillion tokens and shows V4-Pro leading open-weights on The Decoder's cited GDPval-AA benchmark with 1,554 Elo points.
Technical details
Editorial analysis - technical context: The Decoder describes a new hybrid attention architecture combining token compression and sparse attention; according to its summary, this reduces long-context compute to 27% of the FLOPs and 10% of the KV cache for V4-Pro relative to DeepSeek's V3.2, with V4-Flash reported even lower. Oplexa and other commentators frame the architecture and mixture-of-experts routing as enabling a much lower inference cost per token while retaining competitive performance on coding, STEM, and agentic benchmarks. These are industry-level technical claims drawn from DeepSeek's paper as summarized by multiple outlets; practitioners should treat the specific FLOPs and cache improvements as model-paper claims pending independent reproduction.
Market and pricing facts
Oplexa reports a large measured price gap between DeepSeek and premium closed models, citing an example of roughly $3.48 per million output tokens for V4 versus $25 for a Claude comparison. Automios and Trending Topics echo that DeepSeek's pricing sits well below offerings from OpenAI, Google, and Anthropic. Yahoo Finance recalls earlier DeepSeek API price cuts of up to 75% during off-peak hours in 2025. These published figures form the basis for media coverage that frames DeepSeek as compressing inference margins in the market.
Industry context
Editorial analysis: Public reporting frames this moment as a widening bifurcation in the inference market between premium, closed-source models and aggressively priced, open-weight alternatives. Multiple outlets contrast V4 performance as "good enough" relative to the smallest benchmark gaps that enterprise buyers often pay up for. For procurement teams, the primary trade-off being discussed in coverage is price sensitivity versus marginal benchmark advantages from closed models.
For practitioners
Editorial analysis: Teams evaluating model choices should explicitly model three variables emphasized across coverage: real-world throughput and latency for one-million-token contexts, end-to-end inference cost at production scale, and task-level performance delta on your benchmarks. Coverage suggests V4 variants may materially lower per-token cost, but independent validation of latency, memory behavior on your hardware, and instruction-following robustness remains necessary.
What to watch
Editorial analysis: Observers will watch:
- •independent benchmark replications of the FLOPs/KV cache claims
- •third-party latency and memory measurements on both Nvidia GPUs and reported Huawei Ascend hardware
- •competitor pricing changes from OpenAI, Google, and Anthropic that outlets say could follow DeepSeek's move. Also track adoption signals cited by finance reporting, such as integration into downstream services and enterprise pilots, because those determine real-world margin pressure
Limitations on reporting
What happened paragraphs above are based on media coverage and the DeepSeek technical paper as summarized by The Decoder, Oplexa, Trending Topics, Automios, and Yahoo Finance. If DeepSeek has issued separate formal statements or commercial pricing pages, those should be consulted directly for contract terms and SLAs.
Scoring Rationale
A major open-weight model drop with claimed trillion-parameter scale, one-million-token contexts, and aggressive pricing is a significant market event that can reshape inference economics for enterprises. Independent validation and competitor responses will determine how disruptive it becomes.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

