Researchllmbreast cancerclinical guidelineschatgpt
LLMs Demonstrate Mixed Performance in Breast Cancer Diagnostics
7.1
Relevance Score
A 2026 JMIR Medical Informatics study evaluated nine large language models, including ChatGPT‑4o and Claude 3 Opus, on 50 breast‑cancer guideline questions, comparing yes/no answers and analyses to radiologists (residents, fellows, attendings). Using 2024 NCCN and 2013 ACR BI‑RADS standards, ChatGPT‑4o and Claude models scored highest and outperformed fellow physicians in some metrics (P<.05), yet could not fully replace clinical expertise.



