Researchers Publish Improved Mammal-Infection Virus Dataset
On March 27, 2026, Reddy et al. publish an improved, openly available dataset nearly doubling curated host-virus records and adding primate and mammal infection labels for machine-learning. They benchmark eight ML models, report human-infection ROC AUC improvement from 0.663 ± 0.070 to 0.784 ± 0.013 under reduced phylogenetic distance, and find mammal-level prediction achieves 0.850 ± 0.020 while predictions across novel viral families perform at chance (≈0.50).
Scoring Rationale
High practical value from expanded, shared dataset; limited novelty beyond dataset curation and out-of-sample generalization challenges.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems