ChatGPT Health Misses Critical Triage Recommendations

Researchers at Mount Sinai report in Nature Medicine on 23 February 2026 a stress test of OpenAI's ChatGPT Health using 60 clinician-authored vignettes across 21 domains and 16 conditions (960 responses). The system under-triaged 52% of gold-standard emergency cases and showed high failure rates at non-urgent (35%) and emergency (48%) extremes, sometimes directing diabetic ketoacidosis to 24–48-hour evaluation. Crisis-intervention messaging fired inconsistently for suicidal ideation, and authors call for prospective validation before consumer-scale deployment.
Scoring Rationale
High-quality Nature Medicine study revealing systemic triage failures; limited by simulated vignettes rather than prospective real-world evaluation.
Practice with real Health & Insurance data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Health & Insurance problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.
Sources
- Read OriginalChatGPT Health performance in a structured test of triage recommendationsnature.com
