Deep Learning Benchmarks Interpretability for Predictive Genomics
Reynolds and Pan (published Dec 5, 2025) present a benchmarking framework quantifying attribution recall, precision, and stability for deep neural networks trained on UK Biobank genotypes to predict standing height. Using roughly 300,000 participants and over 500,000 autosomal variants, they evaluate Saliency, Gradient SHAP, DeepLIFT, and Integrated Gradients with SmoothGrad, finding SmoothGrad improves recall (~0.16) and precision (~0.06) at the top 1%, with Saliency highest composite score.
Key Points
- 1Demonstrates recall, precision, and stability metrics for attribution on UK Biobank height prediction models.
- 2Shows SmoothGrad increases average recall ~0.16 and precision ~0.06 at the top 1% variant threshold.
- 3Reveals Saliency attains highest composite score, guiding practitioners to prioritize simple gradient attributions.
Scoring Rationale
Rigorous, large-scale benchmarking with open code; limited methodological novelty relative to prior interpretability studies, thus moderate newness.
Sources
Public references used for this report.
Practice with real Banking data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Banking problems

