GPT-4.0 Outperforms Other LLMs On Psychiatric Diagnosis

Researchers retrospectively evaluated GPT-4.0, GPT-3.5, and GLM-4-Plus on 9,923 inpatient EHRs from six Chinese psychiatric centers against physician-confirmed discharge diagnoses. GPT-4.0 achieved 71.7% strict diagnostic accuracy and a weighted F1 of 0.881, with strongest performance on mood and schizophrenia disorders and in older adults (up to 79.5%). Authors conclude LLMs are promising assistive tools requiring further validation before clinical deployment.
Scoring Rationale
High-quality multicenter validation drives score; limited generalizability to adolescents and need for further clinical validation reduce applicability.
Practice with real Health & Insurance data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Health & Insurance problems

