Researchgpt 4psychiatryelectronic health recordsmulticenter

GPT-4.0 Outperforms Other LLMs On Psychiatric Diagnosis

|January 13, 2026|By LDS Team

8.2

Relevance Score

GPT-4.0 Outperforms Other LLMs On Psychiatric Diagnosis — Photo: asset.jmir.pub · rights & takedowns

Researchers retrospectively evaluated GPT-4.0, GPT-3.5, and GLM-4-Plus on 9,923 inpatient EHRs from six Chinese psychiatric centers against physician-confirmed discharge diagnoses. GPT-4.0 achieved 71.7% strict diagnostic accuracy and a weighted F1 of 0.881, with strongest performance on mood and schizophrenia disorders and in older adults (up to 79.5%). Authors conclude LLMs are promising assistive tools requiring further validation before clinical deployment.

Key Points

1Demonstrates GPT-4.0 71.7% strict accuracy on 9,923 multicenter psychiatric inpatient EHRs.
2Highlights stronger performance on high-prevalence disorders like mood and schizophrenia, improving diagnostic reliability.
3Suggests LLMs can assist clinicians but require further validation before routine clinical deployment.

Scoring Rationale

High-quality multicenter validation drives score; limited generalizability to adolescents and need for further clinical validation reduce applicability.