Researchllmdigital healthopenaisafety

ChatGPT Health Misses Critical Triage Recommendations

|February 27, 2026|By LDS Team

9.3

Relevance Score

ChatGPT Health Misses Critical Triage Recommendations

Researchers at Mount Sinai report in Nature Medicine on 23 February 2026 a stress test of OpenAI's ChatGPT Health using 60 clinician-authored vignettes across 21 domains and 16 conditions (960 responses). The system under-triaged 52% of gold-standard emergency cases and showed high failure rates at non-urgent (35%) and emergency (48%) extremes, sometimes directing diabetic ketoacidosis to 24–48-hour evaluation. Crisis-intervention messaging fired inconsistently for suicidal ideation, and authors call for prospective validation before consumer-scale deployment.

Key Points

1Under-triaged 52% of gold-standard emergencies, including diabetic ketoacidosis and impending respiratory failure.
2Highlights inverted U-shaped performance with most dangerous failures concentrated in non-urgent and emergency extremes.
3Urges prospective validation and caution before consumer-scale deployment of AI triage systems.

Scoring Rationale

High-quality Nature Medicine study revealing systemic triage failures; limited by simulated vignettes rather than prospective real-world evaluation.

MoreOpenAI news

Sources

Public references used for this report.

1 source

01nature.comChatGPT Health performance in a structured test of triage recommendations

Practice with real Health & Insurance data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Active PPO Plans with Rx CoverageEasy

Approved High-Value ClaimsMedium

Denial Rate by Plan TypeHard

250 free problems · No credit card

See all Health & Insurance problems

Researchllmdigital healthopenaisafety

ChatGPT Health Misses Critical Triage Recommendations

|February 27, 2026|By LDS Team

9.3

Relevance Score

Key Points

1Under-triaged 52% of gold-standard emergencies, including diabetic ketoacidosis and impending respiratory failure.
2Highlights inverted U-shaped performance with most dangerous failures concentrated in non-urgent and emergency extremes.
3Urges prospective validation and caution before consumer-scale deployment of AI triage systems.

Scoring Rationale

High-quality Nature Medicine study revealing systemic triage failures; limited by simulated vignettes rather than prospective real-world evaluation.

MoreOpenAI news

Sources

Public references used for this report.

1 source

01nature.comChatGPT Health performance in a structured test of triage recommendations

Practice with real Health & Insurance data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Active PPO Plans with Rx CoverageEasy

Approved High-Value ClaimsMedium

Denial Rate by Plan TypeHard

250 free problems · No credit card

See all Health & Insurance problems

ChatGPT Health Misses Critical Triage Recommendations

Key Points

Scoring Rationale

Sources

More AI & Data Science News

Markey Unveils AI Accountability Agenda For Federal Oversight

Python blueprint automates daily project summaries

Gradium Raises $100M Seed Extension Backed by Nvidia

Balance Fraud Prevention with Customer Experience

ChatGPT Health Misses Critical Triage Recommendations

Key Points

Scoring Rationale

Sources

More AI & Data Science News

Markey Unveils AI Accountability Agenda For Federal Oversight

Python blueprint automates daily project summaries

Gradium Raises $100M Seed Extension Backed by Nvidia

Balance Fraud Prevention with Customer Experience