AI Chatbots Propagate Fake Disease 'Bixonimania' Claims

A Swedish researcher deliberately created a fake eye condition called "bixonimania" and uploaded obviously bogus preprints in 2024 to test how AI systems handle medical misinformation. Within weeks, major conversational models including `ChatGPT`, `Gemini`, `Copilot`, and `Perplexity` began treating the fictitious disorder as real, offering prevalence estimates and clinical-style advice. At least one peer-reviewed paper cited the fabricated work before retracting it, exposing researchers and clinicians to citation and patient-safety risks. The episode highlights brittle retrieval, weak source-filtering, and a production gap between model outputs and medical validation pipelines. Immediate fixes are available at system and operational levels, but the incident is a strong reminder that widely deployed LLMs remain unsafe for autonomous health guidance without provenance, stricter ingestion controls, and human oversight.
What happened
A Swedish medical researcher, Almira Osmanovic Thunstrom, invented a fictional eye condition called bixonimania and uploaded two deliberately ridiculous preprints in 2024 containing obvious jokes and admissions that the papers were fabricated. Within weeks, major conversational models, `ChatGPT`, `Gemini`, `Copilot`, and `Perplexity`, began presenting bixonimania as a real diagnosis, offering prevalence numbers and telling users to see an ophthalmologist. One peer-reviewed article even cited the fake work before retracting the citation, underscoring how AI-produced or AI-amplified misinformation can bleed into formal scientific literature.
Technical details
The failure is a compound of two engineering issues. First, public preprints and indexed web pages were ingested or indexed by retrieval pipelines without robust provenance or semantic filters. Second, retrieval-augmented generation pipelines prioritized fluency and confidence over source skepticism, allowing LLMs to synthesize authoritative-sounding answers from low-quality or explicitly false inputs. Key attack surface elements include:
- •the use of preprint servers and scraped academic metadata as retrieval targets
- •lack of automated checks for meta-evidence such as author legitimacy, funding source sanity, and explicit disclaimers inside documents
- •RAG systems that do not propagate verifiable citations or allow model uncertainty to surface when sources are weak
Context and significance
This is not an isolated hallucination. The incident maps directly to known problems in model training and deployment: models replicate and amplify patterns in their training corpora; retrieval systems can elevate fringe or fabricated artifacts; and downstream UI/UX choices disguise uncertainty behind confident natural language. The result matters for health care because diagnostic language changes patient behavior. ECRI naming AI chatbot misuse a top health technology hazard for 2026 is consistent with the downstream risk shown here. Glenn Cohen of Harvard Law School summed the ethical dimension: "we and our health shouldn't be the beta testers for companies." The episode also shows a feedback loop where fabricated web content influences models, which then create additional citations and legitimacy for the false content.
Practical takeaways for practitioners: Operational fixes are available and should be prioritized by teams responsible for clinical AI or public-facing assistants. Recommended mitigations include:
- •enforce strict ingestion whitelists and blacklists for medical corpora and preprint servers
- •add automated provenance scoring that flags documents with AI-generated authors, fictional institutions, or internal contradictions
- •require source-level citations surfaced to users with clickable DOIs and confidence bands, not paraphrased summaries alone
- •integrate human-in-the-loop verification for any diagnostic claim and throttle advice that recommends clinical action
- •maintain adversarial test suites that include planted fakes to validate pipelines before deployment
What to watch
Vendors will likely patch models and retrieval filters, and some systems are already self-correcting on bixonimania. The bigger question is governance: will journals, preprint servers, and platform providers adopt stronger metadata standards and provenance labels to prevent fabricated content from entering model training and retrieval indexes?
Scoring Rationale
The incident exposes a high-risk, practical weakness in deployed LLMs that directly affects patient safety and research integrity. It is notable for showing cross-contamination between web content, models, and peer-reviewed literature, meriting attention from practitioners and platform operators.
Practice with real Health & Insurance data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Health & Insurance problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.


