Models & Researchllama 3.1clinical nlpradiologyfew shot prompting

LLaMA 3.1 Extracts Structured Information from Brain MRI Reports

|June 9, 2026|By LDS Team

5.6

Relevance Score

LLaMA 3.1 Extracts Structured Information from Brain MRI Reports

Per the arXiv preprint 2606.07721, researchers evaluated an open-weight large language model, LLaMA 3.1, on 947 Dutch brain MRI reports from a tertiary memory clinic (2016-2021). Medical-student annotators labeled thirty variables; 100 reports were double-annotated to measure inter-rater reliability, according to the paper. The authors report strong zero-shot performance on visual rating scores, for example Medial Temporal Atrophy left 90% (95% CI 77-100%) and right 96% (95% CI 94-99%), and high detection accuracy for microbleed mentions 93% (95% CI 92-95%). Numerical counts were weaker in zero-shot but improved with few-shot prompting; the paper reports microbleed-count accuracy rising to 92% (95% CI 90-93%) with structural-similarity based example selection. This study demonstrates that open-weight LLMs can perform robust clinical extraction on non-English radiology text, while few-shot strategies materially help numeric extraction.

What happened

Per the arXiv preprint 2606.07721, the authors analyzed 947 brain MRI reports authored by consultant neuroradiologists at a tertiary memory clinic from 2016-2021. Medical students annotated thirty target variables and double-annotated 100 reports for inter-rater reliability. The paper evaluates the open-weight model LLaMA 3.1 on Dutch reports and on English translations, using zero-shot and few-shot prompting with different example-selection strategies.

Technical details

Per the preprint, evaluation metrics included balanced accuracy for categorical labels, accuracy and mean absolute error for counts, and text-similarity measures for free-text outputs. The team compared zero-shot performance to few-shot prompting where examples were selected via structural similarity among candidate reports.

Results

The preprint reports high zero-shot performance on visual rating scales: Medial Temporal Atrophy left 90% (95% CI 77-100%) and right 96% (95% CI 94-99%), Global Cortical Atrophy 87% (95% CI 83-91%), and Fazekas 94% (95% CI 93-96%). Detection of microbleed mentions reached 93% accuracy (95% CI 92-95%); infarct mentions 82% (95% CI 80-84%). Text similarity for lesion location achieved 0.95 (95% CI 0.95-0.96). Numerical extraction was weaker in zero-shot: number of microbleeds 80% (95% CI 78-82%) and infarct counts 66% (95% CI 63-68%). The authors report that few-shot prompting with structural similarity selection improved numerical extraction to 92% (95% CI 90-93%) for microbleeds and 81% (95% CI 77-85%) for infarcts. English translations produced comparable results, per the paper.

Editorial analysis - technical context

Studies applying open-weight LLMs to clinical text provide practical reproducibility advantages versus closed models. Industry-pattern observations: projects using LLMs for structured extraction often find categorical labels and named-entity detection are easier to reach high accuracy on than exact numeric counts or highly granular location details. The reported improvement from targeted few-shot example selection aligns with prior work showing retrieval- or similarity-based example choice helps with low-frequency or numeric tasks.

Context and significance

Industry context: For practitioners curating research cohorts or building registries from radiology text, the results imply that open-weight LLMs like LLaMA 3.1 can automate many visual-rating and mention-detection tasks on non-English reports, while numeric extraction may need tuned prompting or additional post-processing.

What to watch

Follow-up indicators include replication on larger multi-center datasets, head-to-head comparisons with clinical-domain tuned models, and evaluation of downstream dataset bias or error propagation into research cohorts. The authors note comparable performance on English translations, which suggests adaptability across languages but invites broader validation.

Key Points

1Open-weight LLM LLaMA 3.1 achieves high zero-shot accuracy on categorical visual ratings in Dutch neuroradiology reports.
2Few-shot prompting with structural-similarity example selection substantially improves numeric-count extraction performance.
3Categorical labels and mention detection outperform precise location and count extraction, implying need for targeted approaches for numeric tasks.

Scoring Rationale

A single early-stage arXiv preprint showing an open-weight LLaMA model can extract structured ratings from non-English (Dutch) neuroradiology reports. Solid applied clinical-NLP work but narrow in scope and unreplicated, placing it in the interesting-research tier rather than a major release.

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Active Search Campaigns by BudgetEasy

High CPC Clicks & Poor Landing PagesMedium

Campaign ROAS by Attribution ModelHard

250 free problems · No credit card

See all Ad Tech problems

What happened

Technical details

Results

Editorial analysis - technical context

Context and significance

What to watch

Key Points

1Open-weight LLM LLaMA 3.1 achieves high zero-shot accuracy on categorical visual ratings in Dutch neuroradiology reports.

2Few-shot prompting with structural-similarity example selection substantially improves numeric-count extraction performance.

3Categorical labels and mention detection outperform precise location and count extraction, implying need for targeted approaches for numeric tasks.

LLaMA 3.1 Extracts Structured Information from Brain MRI Reports

What happened

Technical details

Results

Editorial analysis - technical context

Context and significance

What to watch

Key Points

Scoring Rationale

More AI & Data Science News

Midjourney Acquires Co-Star as It Expands Consumer Apps

Berkeley Benchmark Finds Agents Fail Most Job Tasks

Study Finds Weak AI Rules Can Backfire

Anthropic Releases Claude Opus 5 at Lower Cost

LLaMA 3.1 Extracts Structured Information from Brain MRI Reports

What happened

Technical details

Results

Editorial analysis - technical context

Context and significance

What to watch

Key Points

Scoring Rationale

More AI & Data Science News

Midjourney Acquires Co-Star as It Expands Consumer Apps

Berkeley Benchmark Finds Agents Fail Most Job Tasks

Study Finds Weak AI Rules Can Backfire

Anthropic Releases Claude Opus 5 at Lower Cost