LLM Outperforms NLP and JEPA in Triage

Researchers at CHU Lille retrospectively developed and compared three AI models (NLP TRIAGEMASTER, LLM URGENTIAPARSE, and JEPA EMERGINET) using 657 triage encounters from June–December 2024 to predict FRENCH triage levels. URGENTIAPARSE achieved F1 0.900, AUC-ROC 0.879, and weighted κ 0.800 but exhibited severe overfitting and inclusion bias (657 of 73,236, 0.90%). External multicenter validation, regularization, and prospective safety testing are required before clinical deployment.
Key Points
- 1Demonstrates URGENTIAPARSE LLM achieves F1 0.900, AUC 0.879, weighted κ 0.800
- 2Highlights severe overfitting and extreme selection bias (657 of 73,236, 0.90%), undermining validity
- 3Requires external multicenter validation, regularization, prospective testing, and uncertainty quantification before deployment
Scoring Rationale
Strong LLM performance and peer-reviewed publication, limited by severe overfitting, extreme selection bias, and monocentric design.
Sources
Public references used for this report.
Practice with real Ride-Hailing data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ride-Hailing problems