Security & Riskhallucinationmedical misinformationretrieval augmentationprovenance

AI Chatbots Propagate Fake Disease 'Bixonimania' Claims

|April 13, 2026|By LDS Team

7.3

Relevance Score

AI Chatbots Propagate Fake Disease 'Bixonimania' Claims — Photo: img1-azrcdn.newser.com · rights & takedowns

A Swedish researcher deliberately created a fake eye condition called "bixonimania" and uploaded obviously bogus preprints in 2024 to test how AI systems handle medical misinformation. Within weeks, major conversational models including `ChatGPT`, `Gemini`, `Copilot`, and `Perplexity` began treating the fictitious disorder as real, offering prevalence estimates and clinical-style advice. At least one peer-reviewed paper cited the fabricated work before retracting it, exposing researchers and clinicians to citation and patient-safety risks. The episode highlights brittle retrieval, weak source-filtering, and a production gap between model outputs and medical validation pipelines. Immediate fixes are available at system and operational levels, but the incident is a strong reminder that widely deployed LLMs remain unsafe for autonomous health guidance without provenance, stricter ingestion controls, and human oversight.

What happened

A Swedish medical researcher, Almira Osmanovic Thunstrom, invented a fictional eye condition called bixonimania and uploaded two deliberately ridiculous preprints in 2024 containing obvious jokes and admissions that the papers were fabricated. Within weeks, major conversational models, `ChatGPT`, `Gemini`, `Copilot`, and `Perplexity`, began presenting bixonimania as a real diagnosis, offering prevalence numbers and telling users to see an ophthalmologist. One peer-reviewed article even cited the fake work before retracting the citation, underscoring how AI-produced or AI-amplified misinformation can bleed into formal scientific literature.

Technical details

The failure is a compound of two engineering issues. First, public preprints and indexed web pages were ingested or indexed by retrieval pipelines without robust provenance or semantic filters. Second, retrieval-augmented generation pipelines prioritized fluency and confidence over source skepticism, allowing LLMs to synthesize authoritative-sounding answers from low-quality or explicitly false inputs. Key attack surface elements include:

•the use of preprint servers and scraped academic metadata as retrieval targets
•lack of automated checks for meta-evidence such as author legitimacy, funding source sanity, and explicit disclaimers inside documents
•RAG systems that do not propagate verifiable citations or allow model uncertainty to surface when sources are weak

Context and significance

This is not an isolated hallucination. The incident maps directly to known problems in model training and deployment: models replicate and amplify patterns in their training corpora; retrieval systems can elevate fringe or fabricated artifacts; and downstream UI/UX choices disguise uncertainty behind confident natural language. The result matters for health care because diagnostic language changes patient behavior. ECRI naming AI chatbot misuse a top health technology hazard for 2026 is consistent with the downstream risk shown here. Glenn Cohen of Harvard Law School summed the ethical dimension: "we and our health shouldn't be the beta testers for companies." The episode also shows a feedback loop where fabricated web content influences models, which then create additional citations and legitimacy for the false content.

Practical takeaways for practitioners

Operational fixes are available and should be prioritized by teams responsible for clinical AI or public-facing assistants. Recommended mitigations include:

•enforce strict ingestion whitelists and blacklists for medical corpora and preprint servers
•add automated provenance scoring that flags documents with AI-generated authors, fictional institutions, or internal contradictions
•require source-level citations surfaced to users with clickable DOIs and confidence bands, not paraphrased summaries alone
•integrate human-in-the-loop verification for any diagnostic claim and throttle advice that recommends clinical action
•maintain adversarial test suites that include planted fakes to validate pipelines before deployment

What to watch

Vendors will likely patch models and retrieval filters, and some systems are already self-correcting on bixonimania. The bigger question is governance: will journals, preprint servers, and platform providers adopt stronger metadata standards and provenance labels to prevent fabricated content from entering model training and retrieval indexes?

Key Points

1LLMs repeated a fabricated disease because retrieval pipelines indexed deliberately bogus preprints without provenance checks, enabling confident hallucinations.
2Medical and scientific workflows are vulnerable: at least one peer-reviewed paper cited the fake work, showing citation and downstream contamination risks.
3Mitigations are practical: stricter ingestion controls, provenance scoring, human-in-the-loop verification, and adversarial testing must be operationalized.

Scoring Rationale

The incident exposes a high-risk, practical weakness in deployed LLMs that directly affects patient safety and research integrity. It is notable for showing cross-contamination between web content, models, and peer-reviewed literature, meriting attention from practitioners and platform operators.

Sources

Public references used for this report.

6 sources

01nature.comScientists invented a fake disease. AI told people it was real - Nature

02news18.comBixonimania: How AI Turned A Fake Illness Into 'Real' Medical ...

03inc.comScientists Invented a Fake Disease Caused by Blue Light—Now It's ...

View 3 more sources

Practice with real Health & Insurance data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Active PPO Plans with Rx CoverageEasy

Approved High-Value ClaimsMedium

Denial Rate by Plan TypeHard

250 free problems · No credit card

See all Health & Insurance problems