Researchllmultrasoundopenaianthropic

LLMs Match Radiologists Using Scoring Model

|December 23, 2025|By LDS Team

8.0

Relevance Score

LLMs Match Radiologists Using Scoring Model — Photo: asset.jmir.pub · rights & takedowns

Researchers at Sun Yat-sen University retrospectively evaluated ChatGPT-4o and Claude 3.5 Sonnet on ultrasound-detected gallbladder polyps ≥1.0 cm using data from January 2011–January 2022, with 223 patients (48 adenomas) and a 100-patient external test set. Text-based scoring strategy produced higher accuracy (radiologists/LLMs 0.34–0.35 vs guideline 0.22) and reduced unnecessary resections (82–83% vs 100%), while image-based LLM analysis showed lower sensitivity.

Key Points

1Show scoring-model LLMs reach similar accuracy to radiologists in classifying polyps ≥1.0 cm
2Reduce unnecessary surgeries compared with guideline, lowering nonneoplastic resection rate from 100% to ~82-83%
3Enable clinics to adopt text-based scoring workflows; image-based LLM interpretation still requires improvement

Scoring Rationale

Solid peer-reviewed evaluation with actionable scoring workflow, limited novelty and single medical domain focus, reducing generalizability.

MoreOpenAI news

Sources

Public references used for this report.

1 source

01medinform.jmir.orgEvaluating Multiple Input Strategies of Large Language Models for Gallbladder Polyps on Ultrasound: Comparative Study

Practice with real Logistics & Shipping data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

High-Value Overnight OrdersEasy

Delivered International ShipmentsMedium

On-Time Delivery Rate by CarrierHard

250 free problems · No credit card

See all Logistics & Shipping problems

Researchllmultrasoundopenaianthropic

LLMs Match Radiologists Using Scoring Model

|December 23, 2025|By LDS Team

8.0

Relevance Score

Key Points

1Show scoring-model LLMs reach similar accuracy to radiologists in classifying polyps ≥1.0 cm
2Reduce unnecessary surgeries compared with guideline, lowering nonneoplastic resection rate from 100% to ~82-83%
3Enable clinics to adopt text-based scoring workflows; image-based LLM interpretation still requires improvement

Scoring Rationale

Solid peer-reviewed evaluation with actionable scoring workflow, limited novelty and single medical domain focus, reducing generalizability.

MoreOpenAI news

Sources

Public references used for this report.

1 source

01medinform.jmir.orgEvaluating Multiple Input Strategies of Large Language Models for Gallbladder Polyps on Ultrasound: Comparative Study

Practice with real Logistics & Shipping data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

High-Value Overnight OrdersEasy

Delivered International ShipmentsMedium

On-Time Delivery Rate by CarrierHard

250 free problems · No credit card

See all Logistics & Shipping problems

LLMs Match Radiologists Using Scoring Model

Key Points

Scoring Rationale

Sources

More AI & Data Science News

Lily Jay Faces Claims Of AI-Generated Charity Videos

Microsoft Shares Rally After Haleon AI Deal

Anthropic Discusses Custom AI Chip With Samsung

OpenAI Offers 5% Stake to U.S. Government

LLMs Match Radiologists Using Scoring Model

Key Points

Scoring Rationale

Sources

More AI & Data Science News

Lily Jay Faces Claims Of AI-Generated Charity Videos

Microsoft Shares Rally After Haleon AI Deal

Anthropic Discusses Custom AI Chip With Samsung

OpenAI Offers 5% Stake to U.S. Government