Microsoft's MDASH Finds 16 Windows Vulnerabilities
According to Microsoft Security Blog, Microsoft announced a new multi-model agentic scanning harness codenamed MDASH that helped find 16 vulnerabilities across the Windows networking and authentication stack, including four critical remote code execution (RCE) flaws. Microsoft's post attributes the discoveries to an ensemble that orchestrates more than 100 specialized AI agents and reports strong benchmark results: 21 of 21 planted bugs found with zero false positives on a private test driver, 96% recall on five years of confirmed MSRC cases for clfs.sys and 100% recall in tcpip.sys, and an 88.45% score on the public CyberGym benchmark, which Microsoft says tops the leaderboard. Infosecurity reports that May Patch Tuesday fixed 120 CVEs overall, 16 of which Microsoft says were found by MDASH. CSO Online reports the system will enter a private enterprise preview in June.
What happened
Per the Microsoft Security Blog, Microsoft introduced a new multi-model agentic scanning harness codenamed MDASH, which the company reports helped researchers discover 16 vulnerabilities in the Windows networking and authentication stack, including four critical remote code execution (RCE) flaws. The blog post states the harness was built by Microsoft's Autonomous Code Security team and orchestrates an ensemble of frontier and distilled models using more than 100 specialized AI agents. Microsoft reports test results including 21 of 21 planted vulnerabilities found with zero false positives on a private test driver, 96% recall against five years of confirmed Microsoft Security Response Center (MSRC) cases in clfs.sys, 100% recall in tcpip.sys, and an 88.45% score on the public CyberGym benchmark, which the post describes as the top leaderboard entry.
Infosecurity reports that Microsoft's May Patch Tuesday fixed 120 CVEs in total and that 16 of those CVEs were discovered using the MDASH system. CSO Online reports the agentic tool will open to enterprise customers in a private preview in June, per its coverage.
Technical details
Per Microsoft's post, MDASH is presented as a model-agnostic, agentic system that combines many specialized agents to: generate candidate vulnerabilities, debate and triage findings, and produce end-to-end exploit proofs. The Microsoft writeup quantifies the harness performance on internal and public benchmarks (see figures above) and includes an architecture diagram showing agent orchestration across multiple model families. Microsoft also reports running the system against both synthetic test drivers and historical MSRC cases to measure recall and false-positive rates.
Industry context
Editorial analysis: Agentic ensembles combining multiple specialist agents and model families are increasingly used in security research to cover diverse tactic-technique-procedure (TTP) patterns and to reduce single-model blind spots. Observers who track AI-assisted offensive and defensive tooling have noted similar multi-agent approaches in red-team research and synthetic telemetry generation, where ensemble debate and structured verification improve signal-to-noise compared with single-model output.
Context and significance
Editorial analysis: The combination of reported high recall and low false-positive rates matters to defenders because vulnerability discovery workloads are sensitive to both missed findings and analyst time spent triaging false positives. If external verification reproduces Microsoft's claimed benchmark gains, defenders and tool vendors may accelerate experimentation with agentic orchestration patterns for fuzzing, static analysis augmentation, and exploit proof generation. From a research perspective, the use of historical MSRC cases as a recall baseline provides a stronger empirical signal than purely synthetic evaluations, though independent replication will be required to validate generalization to other codebases and classes of bugs.
What to watch
Editorial analysis: Observers should watch for:
- •independent benchmark replications or third-party evaluations of MDASH or similar agentic systems
- •the scope of the private preview and whether enterprise testers report similar recall/false-positive characteristics in their environments
- •how vendor tooling and open-source projects adopt agent orchestration patterns for vulnerability discovery, fuzzing, and exploit generation. Also monitor Patch Tuesday disclosures and proof-of-concept activity for the specific CVEs Microsoft and other researchers disclosed, since rapid public exploit development can change enterprise remediation priorities
Practical implications for practitioners
For practitioners: Security teams evaluating AI-assisted vulnerability research tools should treat vendor benchmark claims as a starting point and plan for instrumented pilots that measure both recall and triage burden in their own code and telemetry. For ML engineers exploring agentic systems, this announcement reinforces that multi-agent orchestration and verification pipelines are a practical research direction rather than a purely academic exercise, with measurable operational outcomes reported by an industry-scale team.
Scoring Rationale
The story shows an industry-leading vendor reporting production use of an agentic multi-model system to find real, high-severity vulnerabilities. That is notable for defenders and ML practitioners building security tooling, but independent validation and broader adoption are still required before this becomes paradigm-shifting.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


