Researchllmflaubertemergency departmentselection bias

LLM Outperforms NLP and JEPA in Triage

|March 10, 2026|By LDS Team

7.1

Relevance Score

LLM Outperforms NLP and JEPA in Triage — Photo: asset.jmir.pub · rights & takedowns

Researchers at CHU Lille retrospectively developed and compared three AI models (NLP TRIAGEMASTER, LLM URGENTIAPARSE, and JEPA EMERGINET) using 657 triage encounters from June–December 2024 to predict FRENCH triage levels. URGENTIAPARSE achieved F1 0.900, AUC-ROC 0.879, and weighted κ 0.800 but exhibited severe overfitting and inclusion bias (657 of 73,236, 0.90%). External multicenter validation, regularization, and prospective safety testing are required before clinical deployment.

Key Points

1Demonstrates URGENTIAPARSE LLM achieves F1 0.900, AUC 0.879, weighted κ 0.800
2Highlights severe overfitting and extreme selection bias (657 of 73,236, 0.90%), undermining validity
3Requires external multicenter validation, regularization, prospective testing, and uncertainty quantification before deployment