LLMs Map Radiology Applications And Limitations

A scoping review published in JMIR Med Inform (2025) examined 67 empirical studies of large language models in radiology from January 2022 to December 2024. It found GPT-4 was the most used model (28/67, 42%), identified three main use domains (decision support, report generation, workflow optimization), and reported strong structured-text performance but widely variable diagnostic accuracy (16%–86%). The authors call for multicenter prospective validation of domain-adapted, multimodal models.
Key Points
- 1Found 67 studies, with GPT-4 used in 28 (42%) and text corpora dominant (64%).
- 2Show strong performance on structured-text tasks (>94% accuracy) but variable diagnostic accuracy (16%–86%).
- 3Recommend prospective multicenter validation and domain-adapted multimodal models before clinical integration.
Scoring Rationale
Comprehensive synthesis across 67 studies supports broad relevance, but limited prospective validation reduces immediate clinical impact.
Sources
Public references used for this report.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems
