Bio-Foundation Models Retain Harmful Knowledge Despite Filtering

Researchers from Scale AI, Princeton, University of Maryland, SecureBio, and the Center for AI Safety introduce BioRiskEval, a new framework released to evaluate dual-use risks in open-weight bio-foundation models. Using Evo2-7B, they show fine-tuning can restore filtered viral capabilities within 50 steps (under one hour on a single H100) and linear probing reveals persistent predictive signals, though model accuracy (mutational effect correlation ≈0.2) remains modest.
Key Points
- 1Demonstrates fine-tuning reintroduces filtered viral knowledge within 50 steps (~1 hour on a single H100).
- 2Shows linear probing uncovers predictive signals in hidden layers despite prior dataset filtering.
- 3Implies practitioners must adopt defense-in-depth and lifecycle governance beyond sole reliance on data filtering.
Scoring Rationale
Comprehensive empirical evaluation and reproducible attacks justify a top score, limited by modest current model performance and dataset scope.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
