Cardiology Study Measures Agreement with AI Diagnostics

Per the study published by Mahdavi et al. (London South Bank University), an AI-driven stress-echocardiography system, EchoGo Pro (EGP), and cardiologists agreed on diagnosis in 60.0% of 854 patients enrolled in the multicenter PROTEUS randomized controlled trial. The study reports lower concordance among patients with hypertension (OR=0.58, 95% CI 0.38-0.89, P=.01), diabetes (OR=0.56, 95% CI 0.35-0.90, P=.02), and pre-existing coronary artery disease (OR=0.48, 95% CI 0.30-0.77, P=.002). EGP rejected 26.1% of scans for insufficient image quality, with higher rejection odds in male patients (OR=1.38, P=.03) and those with a family history of CAD. The authors also surveyed 61 UK consultant cardiologists via Qualtrics to explore responses to AI-clinician disagreement and perceived risks.
What happened
Per the study published by Mahdavi et al. (London South Bank University), researchers compared diagnoses from the AI-driven stress-echocardiography system EchoGo Pro (EGP) with consultant cardiologists across 854 participants in the multicenter PROTEUS randomized controlled trial. EGP and cardiologists agreed in 60.0% of cases. Agreement was significantly lower among patients with hypertension (OR=0.58, 95% CI 0.38-0.89, P=.01), diabetes (OR=0.56, 95% CI 0.35-0.90, P=.02), and pre-existing coronary artery disease (OR=0.48, 95% CI 0.30-0.77, P=.002). The system rejected 26.1% of scans for insufficient image quality; rejection was more common in males (OR=1.38, P=.03) and patients with a family history of CAD. The authors supplemented the quantitative analysis with a qualitative survey of 61 UK consultant cardiologists recruited via Qualtrics about perceptions of AI tools and clinician responses to discordant recommendations.
Technical details
The quantitative component used logistic regression adjusted for age, sex, smoking status, body mass index, hypertension, hypercholesterolemia, diabetes, family history of CAD, and prior CAD events to identify predictors of agreement, disagreement, and scan rejection. The qualitative component analysed free-text survey responses to explore how cardiologists perceive risks and act when AI outputs conflict with clinical judgement.
Editorial analysis
Industry-pattern observations: Studies that combine trial data and clinician surveys commonly find that AI-clinician concordance varies across clinical subgroups and image-quality strata, creating practical friction points for deployment and triage workflows. Lower agreement in patients with established cardiometabolic disease mirrors broader findings that pathology and image noise reduce automated diagnostic concordance. High scan-rejection rates in real-world imaging underline the operational importance of image-quality thresholds and pre-scan protocols.
What to watch
For practitioners, monitor subgroup performance metrics (diabetes, hypertension, prior CAD), rejection and override rates in deployment logs, and clinician feedback on risk tolerance. Observers should also compare downstream clinical actions (referrals, invasive testing) in cases of discordance to assess patient-level impact.
Scoring Rationale
The study provides clinically focused, trial-backed evidence about AI-clinician concordance in cardiology, relevant to deployments and validation practices. It is notable for practitioners but not a frontier-model breakthrough.
Practice with real Health & Insurance data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Health & Insurance problems


