On Friday, April 24, 2026, the DeepSeek team posted two model checkpoints to Hugging Face and a 600-word announcement on its API docs. The post was titled "DeepSeek-V4 Preview Release." Inside the documentation were links to a 1.6-trillion-parameter mixture-of-experts model, a 284-billion-parameter sibling, and a tech report claiming both could process one million tokens of context.
By Saturday morning, every major benchmark site had posted preliminary scores. By Sunday, three of them were ranking V4-Pro ahead of Google's Gemini 3.1 Pro on coding tasks and within a fraction of Anthropic's Claude Opus 4.6 on real-world software engineering, the model whose release was covered in Claude Opus 4.6: Anthropic Just Dropped Its Most Intelligent Model.
The bigger surprise sat in DeepSeek's pricing table.
V4-Pro charges $1.74 per million input tokens and three dollars and forty-eight cents per million output tokens. Claude Opus 4.7 charges five dollars input and twenty-five dollars output. GPT-5.5 charges five dollars input and thirty dollars output. On combined input-and-output workloads, V4-Pro runs roughly 1/9th the price of GPT-5.5 and 1/9th the price of Opus 4.7, with the cost gap widening to 1/20th on output-heavy workloads.
For data scientists running production inference at meaningful volume, that math is the story.
What V4 Actually Hit
DeepSeek's V4-Pro is a sparse mixture-of-experts model with 1.6 trillion total parameters and 49 billion active per token. V4-Flash is a smaller sibling with 284 billion total and 13 billion active. Both target a one million token context window, the largest publicly documented context for an open-weights model under permissive license.
The headline benchmark scores, taken from DeepSeek's tech report and from Vals AI, Arena.ai, and SWE-bench's leaderboards through April 26:
| Benchmark | V4-Pro | Closest Frontier | Notes |
|---|---|---|---|
| LiveCodeBench | 93.5 | Gemini 3.1 Pro at 91.7 | V4-Pro leads open and closed |
| Codeforces (rating) | 3206 | GPT-5.4 at 3168 | Real-world competitive code |
| SWE-bench Verified | 80.6% | Claude Opus 4.6 at 80.8% | 0.2 pt behind |
| SWE-bench Pro | 55.4% | Claude Opus 4.7 at 64.3% | Still a gap on hard agent tasks |
| MMLU-Pro | 87.5% | GPT-5.5 at ~88% | Within margin of error |
| GPQA Diamond | 90.1% | Claude Opus 4.6 at ~90% | Near-frontier reasoning |
The benchmarks where V4-Pro leads are unusually concentrated in coding and reasoning. The benchmarks where it trails are mostly long-horizon agent tasks where Anthropic's Opus line is still the leader.
Vals AI summarized V4 as "overwhelmingly" topping the open-source field in its Vibe Code Benchmark, beating closed-source Gemini 3.1 Pro and producing what its analysts called "about a tenfold performance leap" over V3.2. Arena.ai placed V4-Pro in thinking mode third among open-source coding models and fourteenth overall in its code arena.
The Architecture Story
The V4 jump did not come from scaling parameters alone. DeepSeek's tech report describes three architectural changes that explain how a 1.6T parameter model runs cheaper per token than a 600B parameter dense competitor.
The first change is the attention layer. V4 uses a hybrid of two new mechanisms, Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA). The pair targets the long-context regime: at one million tokens, V4-Pro requires only 27% of the single-token inference FLOPs and 10% of the KV cache that V3.2 needed at the same context length.
For practitioners, that means long-context inference, the use case where most frontier models fall over, becomes economically viable on V4-Pro at prices an order of magnitude below the alternatives.
The second change is Manifold-Constrained Hyper-Connections (mHC), a residual-connection variant that DeepSeek says improves signal stability across very deep layers. The third is a switch to the Muon optimizer, which the team claims gave faster convergence and more stable training behavior than AdamW at this scale.
DeepSeek released the open weights for both V4-Pro and V4-Flash under the MIT License, with the full tech report at huggingface.co/deepseek-ai/DeepSeek-V4-Pro. The license permits commercial use without attribution.
How DeepSeek's API Pricing Compares
| Model | Input ($/M tokens) | Output ($/M tokens) | Open weights? |
|---|---|---|---|
| DeepSeek V4-Flash | 0.14 | 0.28 | Yes (MIT) |
| DeepSeek V4-Pro | 1.74 | 3.48 | Yes (MIT) |
| Claude Sonnet 4.5 | 3.00 | 15.00 | No |
| GPT-5.5 Mini | 0.40 | 1.60 | No |
| Gemini 3.1 Flash | 0.30 | 2.50 | No |
| Claude Opus 4.7 | 5.00 | 25.00 | No |
| GPT-5.5 | 5.00 | 30.00 | No |
V4-Pro is closer in price to a fast model like GPT-5.5 Mini than it is to its actual benchmark peers. V4-Flash undercuts every closed frontier-class flash variant on the table.
For teams running inference workloads in the millions of tokens per day, the implications are direct. A workload that costs $30,000 per month on Claude Opus 4.7 output tokens runs closer to forty-two hundred on V4-Pro and roughly three hundred and forty dollars on V4-Flash, assuming similar output volumes.
How V4 Reached Friday
The Reception
VentureBeat called V4 "near state-of-the-art intelligence at one-sixth the cost." MIT Technology Review's Will Douglas Heaven argued the release "closes the gap" with frontier closed models on most measurable axes.
DeepSeek's internal team went further. The company said V4 has become its primary model for agentic coding among employees, with evaluation feedback claiming the experience exceeds Claude Sonnet 4.5 and approaches Opus 4.6 in non-thinking mode.
NVIDIA published a developer post the same Friday explaining how to deploy V4-Pro on Blackwell GPUs and through GPU-accelerated endpoints, a sign that the largest US chip vendor sees V4 as a serious enterprise inference target.
Not everyone is sold. Tech commentator Michael Anti, posting on X, said V4 Flash's actual experience did not surpass the well-developed V3.2 and described the upgrade as disappointing for established users. The launch also showed visible rough edges, including chat instances reportedly self-identifying as V3, confusion about model availability across endpoints, and incomplete third-party benchmark coverage.
There is also the gap on hardest tasks. SWE-bench Pro, the most challenging agent benchmark, still favors Claude Opus 4.7 at 64.3% over V4-Pro at 55.4%, a roughly nine-point spread that matters for production agentic workloads.
The Other Side: What the Cheap Tokens Hide
Pricing this aggressive raises three questions practitioners should consider before swapping providers.
The first is throughput. DeepSeek's API has historically rate-limited at much tighter thresholds than OpenAI or Anthropic, especially during peak hours in Asia-Pacific. Several analysts including SemiAnalysis founder Dylan Patel have flagged the V4 launch capacity as "constrained" in the first two weeks.
The second is data residency. DeepSeek's hosted API runs out of mainland China. For US enterprises with regulated data, that is a hard stop. The MIT-licensed weights solve this only if the buyer is willing to self-host, which means owning enough Blackwell or Hopper GPUs to run a 49B active-parameter MoE model at acceptable latency.
The third is provenance. A separate LDS investigation documented in DeepSeek Trained Its Trillion-Parameter Model on Smuggled Nvidia Chips reported that US officials concluded V4-class training runs were powered by Nvidia chips routed through third-country buyers in apparent violation of export controls. Several US government contractors have already informed staff that DeepSeek-hosted endpoints are off-limits while the issue is reviewed.
The benchmark scores do not change. The procurement implications might.
What It Means for Practitioners
The first practical question is: where does V4 actually save money?
For long-context retrieval, document analysis, and code search workloads, V4-Pro at $3.48 per million output tokens replaces Opus 4.7 at twenty-five dollars almost line for line on accuracy. The savings are real and immediate.
For agentic coding workloads where the model has to plan, execute, and revise across long horizons, Claude Opus 4.7 still produces better outcomes per task. The price premium buys success rate on the hardest 15-20% of tasks where V4 falls short.
For self-hosted deployment, V4-Flash is the most interesting option. At 284 billion total parameters with 13 billion active, it can run on a single eight-GPU Blackwell node with 80GB memory per GPU, putting it within reach of any enterprise that already runs Llama 4 or Mistral Large at scale.
For Hugging Face users, the open weights are available now under MIT, which means downstream fine-tunes will start appearing within days. By mid-May, expect domain-specialized variants from Together AI, Fireworks, DeepInfra, and several research labs.
The Bottom Line
DeepSeek shipped a model that in most public benchmarks lands somewhere between GPT-5.4 and Claude Opus 4.6. It did so on open weights, under an MIT License, at output prices that are 98% lower than GPT-5.5 Pro and roughly 86% lower than Claude Opus 4.7.
For frontier closed labs, the message is unambiguous. The price floor on inference just dropped by an order of magnitude on every workload that does not require the very last benchmark point. The same dynamic showed up two weeks ago when OpenAI shipped GPT-5.5 six weeks after 5.4 and doubled the price. DeepSeek's V4 walks straight into that gap. Anthropic, OpenAI, and Google can now charge premium prices only on the workloads where their models genuinely beat V4. Everywhere else, the price will follow DeepSeek down.
For ML engineers, the message is simpler. Run a side-by-side eval this week. The math has changed.
VentureBeat's analysis put it most directly: V4 delivers "near state-of-the-art intelligence at one-sixth the cost." For every workload that does not need the very last percentage point on a benchmark, that math is already settled.
Sources
- DeepSeek V4 Preview Release (DeepSeek API Docs, April 24, 2026)
- deepseek-ai/DeepSeek-V4-Pro (Hugging Face, April 24, 2026)
- DeepSeek-V4 Arrives With Near State-of-the-Art Intelligence at 1/6th the Cost of Opus 4.7, GPT-5.5 (VentureBeat, April 25, 2026)
- Three reasons why DeepSeek's new model matters (MIT Technology Review, April 24, 2026)
- DeepSeek previews new AI model that closes the gap with frontier models (TechCrunch, April 24, 2026)
- DeepSeek V4 Is Here. Its Pro Version Costs 98% Less Than GPT 5.5 Pro (Decrypt, April 24, 2026)
- DeepSeek V4 Returns to Challenge US AI Giants With Aggressive Pricing (Trending Topics, April 25, 2026)
- DeepSeek V4: Features, Benchmarks, and Comparisons (DataCamp, April 26, 2026)
- Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints (NVIDIA Developer Blog, April 24, 2026)
- LLM Coding Benchmark April 2026: GPT 5.5, DeepSeek v4, Kimi v2.6, MiMo, and the State of the Art (AkitaOnRails, April 24, 2026)
- Mapping the DeepSeek V4 Evaluation Suite (Redreamality, April 26, 2026)