Security & Riskresearch integrityfabricated citationsllmsbiomedical publishing

Audit Finds AI-Fabricated Citations Across Biomedical Papers

|May 30, 2026|By LDS Team

7.1

Relevance Score

Audit Finds AI-Fabricated Citations Across Biomedical Papers — Photo: imageio.forbes.com · rights & takedowns

A correspondence in The Lancet by Maxim Topaz and colleagues reports an audit of 2.5 million biomedical papers in PubMed Central Open Access covering January 1, 2023 to February 18, 2026. The authors identified 4,046 fabricated references across 2,810 papers from a set of 97.1 million verified references, per the correspondence (Topaz et al., The Lancet, May 9, 2026). The research team developed an automated verification pipeline and used an LLM review step, reported by Forbes and Columbia University, including Claude 3.5 Haiku to triage flagged items. The Lancet correspondence and Columbia press materials link the sharp rise in fabricated citations since mid-2024 to the increasing use of AI writing tools, and the authors recommend publishers verify references and indexing services add metadata to track fake citations.

What happened

The correspondence in The Lancet by Maxim Topaz and colleagues presents a reference-integrity audit of 2.5 million biomedical papers in PubMed Central Open Access spanning January 1, 2023 to February 18, 2026, per the published correspondence (Topaz et al., The Lancet, May 9, 2026). The authors report that their automated verification found 4,046 fabricated references embedded in 2,810 papers out of 97.1 million verified references, according to the same correspondence and Columbia University reporting (Columbia School of Nursing news release, May 8, 2026). The audit shows a rise in fabrication rates from roughly 4 per 10,000 papers in 2023 to about 57 per 10,000 by early 2026, which the authors describe as a greater than 12-fold increase, as summarized by Nature and Columbia University.

The research team used a multi-stage pipeline to detect suspect references, comparing cited items to bibliographic records in PubMed, Crossref, OpenAlex, and Google Scholar, and then applied an LLM-assisted review step to separate likely honest errors from fabrications, reporting the use of Claude 3.5 Haiku for triage in press coverage (Forbes). The Lancet correspondence notes fabricated references can arise from paper-mill activity, intentional misconduct, or the uncritical use of AI writing tools, and cites prior work estimating that 30-69% of LLM-generated references in biomedical contexts may be fabricated.

Technical details

Per the correspondence and affiliated press reporting, the authors built an automated reference-verification system to scan metadata and bibliographic fields across a large corpus in PubMed Central Open Access. The pipeline flagged mismatches between in-text citations and external bibliographic databases, then used an LLM reviewer to adjudicate flagged cases at scale, as described in Forbes and Columbia University materials. The audit treated references that could not be located in major databases as fabricated for the purposes of the study, a methodological decision documented in the Lancet correspondence.

Context and significance

What to watch

For practitioners

indicators to monitor include:

•whether major publishers implement automated reference checks at submission
•indexing services such as PubMed, Crossref, and OpenAlex add metadata fields or flags for suspected fabricated references
•follow-up audits that replicate methodology across broader corpora beyond PubMed Central Open Access. Also watch for community standards around documenting AI assistance in manuscript preparation and for tooling that integrates reference verification into authoring environments

Editorial analysis

This methodology leverages scalable metadata matching plus human-supervised LLM triage, which is a pragmatic approach to large-scale auditing. Industry-pattern observations show similar hybrid pipelines are increasingly used to screen data quality when manual review would be infeasible, though outcome sensitivity depends on database coverage and the adjudication rules used.

The finding matters to ML practitioners and researchers for multiple reasons. First, fabricated citations degrade the reliability of the scientific record that practitioners rely on when building literature reviews, training domain-specific models, or curating datasets for biomedical NLP. Second, automated systems and search indices that ingest published literature can propagate fabricated items into downstream tools, from knowledge graphs to retrieval-augmented generation pipelines. Third, the reported timing of the rise, beginning mid-2024 and accelerating into 2025 and early 2026, aligns with increased adoption of generative writing tools in research workflows as reported by Columbia and coverage in Nature and STAT.

The Lancet correspondence and Columbia authors also make operational recommendations as part of their reported findings: publishers should verify references at submission, indexing services should add metadata flags for fake references, and research-integrity databases should create a dedicated category for fabricated citations, per the Columbia School of Nursing release and the Lancet correspondence.

Researchers and data teams assembling training corpora or systematic reviews should be aware that a nontrivial set of published biomedical references may be fabricated, and that automated screening plus spot checks may be required to reduce contamination. The broader pattern in publishing shows a growing need for computational provenance and stronger publisher-side validation as AI-assisted writing becomes more common.

Direct quotes and commentary in coverage

The Columbia School of Nursing press materials quoted study authors warning about clinical implications, stating in part that fabricated citations "directly impact patients" when non-existent evidence is used in clinical guidance. STAT quoted sociologist Misha Teplitskiy calling the study "one of the first papers that's telling us something about the quality of what's being produced with LLMs," reflecting external concern about the quality implications of LLM use in science.

Limitations reported by the authors

The correspondence documents that the audit used PubMed Central Open Access papers only, not the entire PubMed corpus, a point corrected in subsequent coverage (Nature). The authors also note that the decision rule classifying a reference as fabricated relied on failure to locate the cited item in major bibliographic resources, which may miss some legitimate but poorly indexed material.

Key Points

1A Lancet correspondence reports 4,046 fabricated references across 2,810 papers in a 2.5 million-paper audit, signaling large-scale integrity issues.
2Industry-pattern observation: hybrid pipelines combining metadata matching and LLM-assisted triage scale audits, but depend on database coverage and adjudication rules.
3For practitioners: fabricated citations can contaminate literature-derived datasets and retrieval systems, increasing the need for reference verification during data curation.

Scoring Rationale

The audit documents a measurable, rapid rise in fabricated citations that affects literature reliability, dataset curation, and downstream ML applications. The story is notable for practitioners who rely on published biomedical corpora, but it is not an immediate paradigm shift for the broader AI frontier.

MoreLLMs news

Sources

Public references used for this report.

11 sources

nursing.columbia.eduNearly 3000 peer-reviewed medical papers have fake citations, a ...

thelancet.comFabricated citations: an audit across 2·5 million biomedical papers

nature.comSurge in fake citations uncovered by audit of 2.5 million biomedical ...

View 8 more sources

Practice with real Health & Insurance data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Active PPO Plans with Rx CoverageEasy

Approved High-Value ClaimsMedium

Denial Rate by Plan TypeHard

250 free problems · No credit card

See all Health & Insurance problems