Security & Riskopen source modelsautomated bug findinginfosecanthropic mythos

Open-source Models Match Mythos in Bug Finding

|April 24, 2026|By LDS Team

6.8

Relevance Score

Open-source Models Match Mythos in Bug Finding — Photo: regmedia.co.uk · rights & takedowns

At Black Hat Asia, OpenAI's first security hire, Ari Herbert-Voss, argued that open-source models can find software bugs as effectively as Anthropic's Mythos when combined into orchestration pipelines. He credited Mythos for strong performance on both shallow and complex vulnerabilities, attributing that capability to what he called "supralinear scaling." Herbert-Voss says defenders can replicate Mythos-grade results by building scaffolding that runs several open models in harness, producing defense in depth and covering each model's blind spots. Cost and access make open-source an attractive option for many organizations, but human experts remain essential to orchestrate models and triage the high volume of findings generated by fuzzing and AI-assisted testing. The net effect, he predicts, is improved security practices rather than large-scale job displacement.

What happened

At Black Hat Asia, Ari Herbert-Voss, OpenAI's first security hire and CEO of RunSybil, argued that ensembles of open-source models can match the bug-finding effectiveness of Anthropic's Mythos. He highlighted Mythos ability to find both "shallow" bugs - well-described flaws that are and easy to validate - and more complex vulnerabilities and attributed part of its edge to the phenomenon he called "supralinear scaling." Herbert-Voss said that practical scaffolding that runs multiple open models in concert provides defense in depth and similar coverage while avoiding Mythos access and cost constraints.

Technical details

Herbert-Voss emphasized that replication of Mythos performance is not a single-model drop-in but a systems problem requiring human orchestration. Practitioners need to combine models, routing logic, and validation layers so results complement each other rather than duplicate noise. Key practical levers include:

•model diversity and ensembling across architectures and checkpoints
•automated triage and prioritization to reduce false positives
•integration with fuzzing pipelines and runtime instrumentation

He also flagged that fuzzing and AI-generated test cases produce high volumes of warnings. Humans remain necessary to validate exploitability and prioritize actionable findings. He mentioned cost as a major differentiator: building and operating a proprietary Mythos-class model is expensive, whereas ensembles of open models are cheaper to iterate with but require staff who can manage orchestration and compute.

Context and significance

This is a practical reframing in the ongoing debate over proprietary frontier models versus open-source alternatives. Mythos represents specialized, restricted-access tooling tailored for security; the broader lesson Herbert-Voss offered is that domain-specific capability often materializes through system design, tooling, and workflows as much as raw model scale. For security teams and ML engineers, that means investment in orchestration, automated validation, and developer workflows may deliver outsized returns compared with single-model procurement.

What to watch

Teams should prototype ensembles and invest in triage automation to measure real-world signal-to-noise. Watch for open-source toolkits that standardize scaffolding, and for commercial vendors packaging orchestration and validation layers around model ensembles. The practical constraint remains cost of compute and the human labor needed to validate and exploit findings; those will determine adoption speed.

Key Points

1Open-source model ensembles can match proprietary Mythos by using scaffolding to combine complementary model strengths.
2Cost and restricted access to Mythos make open-source a practical choice; orchestration and human triage are the gating factors.
3High-volume outputs from fuzzing and AI testing increase defender workload, so investment in automated triage and validation is critical.

Scoring Rationale

This is a notable, practitioner-facing observation: open-source ensembles can deliver competitive bug-finding capability, shifting emphasis to orchestration and triage. Not a paradigm shift, but important for security teams evaluating proprietary tooling versus build approaches.

Sources

Primary source and supporting public references used for this report.

1 source

Primary sourcetheregister.comOpen source models can find bugs as well as Mythos

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems