Study finds limited AI chatbot presence in online surveys

A new preprint on PsyArXiv, reported by Retraction Watch, finds that AI chatbots currently represent a small share of online survey responses. Retraction Watch reports the study used Prolific's authenticity checker and examined roughly 4,800 responses gathered across 12 survey suppliers, finding fewer than 1% of responses contained text likely not written by a human. Retraction Watch also reports a separate sample of 400 responses from a 13th supplier had about 16% of replies flagged. The study tested ChatGPT, Gemini, Claude, Perplexity and an internal Prolific agent; Retraction Watch quotes the lead author saying, "That gives us a lot of confidence in that measure." Retraction Watch notes Prolific funded the project and that the authenticity checker has not been made available to external researchers.
What happened
Retraction Watch reports a new preprint on PsyArXiv, funded by Prolific, that examined the prevalence of AI-generated replies in online research surveys. The study analysed roughly 4,800 responses collected across 12 survey suppliers and found fewer than 1% of responses contained text likely not written by a human, per Retraction Watch. In an additional sample of 400 responses from a 13th supplier, Retraction Watch reports about 16% of replies were flagged as possibly chatbot-generated.
Technical details
Retraction Watch reports the study used an authenticity checker developed by Prolific and that the tool was evaluated using synthetic tests: the survey instrument was completed 25 times each by ChatGPT, Gemini, Claude, Perplexity and an internal Prolific agent. Retraction Watch quotes the lead author saying, "That gives us a lot of confidence in that measure." The article states the checker correctly identified all 125 chatbot-completed surveys in the test set and did not misidentify any of 124 human-completed surveys, according to the lead author reported by Retraction Watch. Retraction Watch also reports the study sample relied on participants with high approval ratings and notes the authenticity checker has not been released for external verification.
Context and significance
Industry context: Observers have raised concerns that large language models could be used to spoof survey responses, degrading data quality for academic and market research. Retraction Watch cites a prior PNAS study that demonstrated technical feasibility for such infiltration; the PNAS author told Retraction Watch they declined to collaborate on the new study and flagged Prolific's financial stake and the inability to externally verify the unpublished checker.
What to watch
For practitioners, Retraction Watch highlights several open questions and indicators:
- •Independent validation, including public release or third-party audit of detector tools
- •Differences in supplier samples and participant screening that affect chatbot prevalence
- •Evolution of detection-evasion techniques by generative models and corresponding updates to checks
Editorial analysis
Industry-pattern observations: Data-quality risk from automated respondents remains a dynamic, adversarial problem. Companies and researchers often face a trade-off between proprietary detection methods and reproducibility; independent benchmarks and transparent evaluation sets are typical mitigations in comparable domains.
Scoring Rationale
The study addresses a practical data-integrity concern for researchers and product teams that rely on online surveys. It is notable because it reports low prevalence overall but exposes supplier variability and reproducibility limits due to proprietary detection, making it relevant but not industry-shattering.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


