DeepSeek R1 Outperforms LLMs On ASCVD Responses

Researchers conducted a cross-sectional evaluation May 15–30, 2025, comparing DeepSeek R1, ChatGPT-4o, and Gemini on 25 ASCVD patient questions in English and Chinese, generating 750 responses scored by three cardiologists. DeepSeek R1 achieved a 96% good-response rate (24/25) with higher accuracy and completeness, but all models failed to reliably provide guideline-concordant treatment regimens, indicating need for expert oversight.
Key Points
- 1Demonstrates DeepSeek R1 achieved 96% good-response rate in both languages (24/25).
- 2Highlights superior accuracy and completeness versus ChatGPT-4o and Gemini (P<.001).
- 3Warns models failed to reliably provide guideline-concordant ASCVD treatment regimens, requiring expert oversight.
Scoring Rationale
Strong comparative evaluation and robust methods, but limited by focus on ASCVD and guideline-concordance weakness.
Sources
Public references used for this report.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problems