Static Malware Detectors Fail Across Diverse Datasets

Researchers at the Polytechnic of Porto publish a study on April 1, 2026 testing ML-based static Windows PE malware detectors across six public datasets and four external collections. They find models score high in-distribution (AUC/F1 in the high 90s) but generalize poorly to temporally diverse and obfuscated datasets like SOREL-20M and ERMDS. The results imply procurement and engineering teams must validate detectors on operational, diverse data at low false-positive rates.
Scoring Rationale
Solid, timely academic evaluation with industry-wide implications; scores high on scope and relevance for endpoint security. Novelty and credibility are strong though not a paradigm shift, and coverage lacks deep methodological detail, so the score is moderated slightly.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.
Sources
- Read OriginalMalware detectors trained on one dataset often stumble on anotherhelpnetsecurity.com



