Models & Researchllmshealthcarecognitive biassimulation study

Study Tests Patient Cognitive Bias in LLM Consultations

|June 11, 2026|By LDS Team

5.9

Relevance Score

Study Tests Patient Cognitive Bias in LLM Consultations — Photo: asset.jmir.pub · rights & takedowns

A simulation-based comparative study published in the Journal of Medical Internet Research (JMIR) finds that patient cognitive bias reduces LLM diagnostic accuracy by 10-40 percentage points (P < .001) across six models tested on 1,273 MedQA-USMLE cases. Researchers Yi Zuo, Qifeng Wan, and Shalong Wang developed a simulated patient agent that generated confirmation-biased and unbiased consultations, finding that errors frequently reflected user misconceptions -- the bias-influenced error proportion (BIEP) exceeded 33%. Neither prompt engineering nor temperature adjustments provided consistent resilience. A dual-system framework pairing a foundation model (System 1) with o1-Mini as a deliberative reasoning layer (System 2) recovered 10-39 percentage points of lost accuracy (P < .001). The findings establish user cognitive bias as a newly quantified behavioral risk in patient-facing AI tools, with implications for clinical deployment standards and evaluation benchmarks.

What the study found

A simulation-based comparative study published in the Journal of Medical Internet Research (JMIR) establishes that patient cognitive bias meaningfully degrades LLM diagnostic performance in health consultations. Researchers Yi Zuo, Qifeng Wan, and Shalong Wang developed a simulated patient agent to generate unbiased and confirmation-biased consultations using 1,273 MedQA-USMLE cases, then evaluated six LLMs of varying capacities through multi-turn dialogues. The primary finding: user cognitive bias reduced diagnostic accuracy by 10-40 percentage points (P < .001), with smaller models occasionally performing near chance level. A secondary metric, the bias-influenced error proportion (BIEP), exceeded 33% -- meaning a substantial fraction of model errors directly reflected the user's misconceptions rather than independent model reasoning.

Methods

The study used two bias-simulation modes: unbiased consultations and confirmation-biased consultations in which the simulated patient agent steered dialogue toward a preconceived diagnosis. Authors measured three outcomes: diagnostic accuracy, bias-induced accuracy decline (BIAD, loss under bias), and bias-influenced error proportion (BIEP, fraction of errors aligned with user misconceptions). They then tested four prompt-based mitigation strategies, four temperature settings, and a dual-system framework inspired by dual-process cognitive theory -- System 1 being a standard foundation model and System 2 being o1-Mini as a deliberative reasoning layer.

Key results

Prompt engineering and temperature adjustments produced limited or inconsistent improvements -- neither reliably counteracted patient confirmation bias. In contrast, the dual-system framework increased accuracy by 10-39 percentage points and recovered most or all of the bias-driven performance gap (P < .001). This suggests architectural interventions, rather than prompting alone, are needed for bias-resilient clinical AI.

Why it matters

For practitioners building or evaluating patient-facing AI tools, the study introduces a concrete and previously underspecified failure mode: users themselves are a source of reasoning error. Standard benchmarks such as MedQA do not capture this dimension; the study's BIAD and BIEP metrics provide a practical evaluation vocabulary. The dual-system result offers a deployment path -- pairing a fast response model with a slower deliberative reasoning model may be a scalable safeguard for higher-stakes medical applications.

Key Points

1A JMIR simulation study by Yi Zuo et al. finds patient cognitive bias cuts LLM diagnostic accuracy by 10-40 percentage points across six models on 1,273 MedQA-USMLE cases (P < .001).
2Prompt engineering and temperature tuning yielded limited improvement, while a dual-system framework pairing a foundation model with o1-Mini recovered most of the bias-driven accuracy loss.
3User cognitive bias is newly quantified as a behavioral risk layer in patient-facing AI; BIAD and BIEP metrics from this study offer a concrete evaluation vocabulary for clinical deployment.

Scoring Rationale

Solid niche research with quantitatively significant findings: 10-40 percentage points accuracy drop under patient bias is meaningful and underspecified in existing benchmarks. The dual-system mitigation result has practical deployment relevance. Score reflects a well-executed domain-specific study rather than a paradigm-shifting result.

MoreLLMs news

Sources

Public references used for this report.

3 sources

jmir.orgPatient Cognitive Bias in Large Language Model-Supported Health Consultations: Simulation-Based Comparative Study

researchgate.netEvaluating Patient Cognitive Bias in Large Language Model-Supported Health Consultations: A Simulation-Based Comparative Study (Preprint)

dspace.stir.ac.uk[PDF] Cognitive bias in clinical large language models - University of Stirling

Practice with real Health & Insurance data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Active PPO Plans with Rx CoverageEasy

Approved High-Value ClaimsMedium

Denial Rate by Plan TypeHard

250 free problems · No credit card

See all Health & Insurance problems