LLMs Reveal Knowledge-Practice Gap in Medicine

A systematic review published in Journal of Medical Internet Research (2025) analyzed 39 medical LLM benchmarks through Aug 31, 2025, covering over 2.3 million questions across 45 languages and 172 specialties. It found knowledge-based benchmarks score 84%-90% while practice-based assessments lag at 45%-69% and safety tasks at 40%-50%, concluding exam scores are insufficient proxies for clinical readiness.
Key Points
- 1Identify 39 benchmarks encompassing 2.3 million questions across 45 languages and 172 specialties
- 2Show knowledge benchmarks achieve 84%-90% yet practice-based performance falls to 45%-69%
- 3Warn that exam success is insufficient; mandate practice-oriented validation and human oversight
Scoring Rationale
Comprehensive systematic review with robust data and clear clinical implications, though limited by heterogenous benchmark methodologies.
Sources
Public references used for this report.
Practice with real Health & Insurance data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Health & Insurance problems


