arXiv imposes one-year ban for unchecked LLM output

arXiv has clarified enforcement for submissions that include unverified large language model (LLM) output. Per a public thread by Thomas G. Dietterich, chair of arXiv's computer science moderators, a submission that contains "incontrovertible evidence that the authors did not check the results of LLM generation", examples cited include hallucinated references and leftover LLM meta-comments, can trigger a one-year ban and a requirement that future arXiv submissions first be accepted at a reputable peer-reviewed venue, as reported by The Verge and other outlets. The platform's Code of Conduct was cited in Dietterich's thread, noting authors bear responsibility for content irrespective of how it was generated. Community reaction has been mixed, with some researchers supporting the move and others warning about selective enforcement and false-positive risks, according to The Decoder and social reporting aggregated by di.gg.
What happened
Per a public thread by Thomas G. Dietterich, who moderates arXiv's computer science section, arXiv clarified penalties for submissions that contain "incontrovertible evidence that the authors did not check the results of LLM generation," as reported by The Verge. Dietterich's post lists examples such as hallucinated references and leftover model "meta-comments." The stated penalty is a one-year ban from arXiv followed by a requirement that subsequent arXiv submissions be first accepted at a reputable peer-reviewed venue before resubmission, as summarized in coverage by The Verge, The Decoder, Startup Fortune, and social aggregation on di.gg.
Technical details
Per the examples Dietterich highlighted in his public thread and as relayed by The Verge and The Decoder, moderators are flagging clear, verifiable artifacts of unchecked LLM use. Examples cited in reporting include: hallucinated citations that reference non-existent papers or journals, explicit LLM instruction text left in the manuscript (for example, "Here is a 200-word summary"), and placeholder or fabricated results and tables that were not replaced with real data. The platform's Code of Conduct, also noted in Dietterich's thread, reiterates that authors take responsibility for all manuscript contents regardless of how they were produced.
Industry context
Editorial analysis: Platforms that host preprints are confronting a rise in automated text generation and associated integrity problems. Industry observers note that clear, automated-detection-friendly artifacts, like phantom references or leftover prompts, have become a common trigger for moderation because they are straightforward to verify. At the same time, reviewers and moderators face tradeoffs between speed of access and quality control; public reporting indicates the arXiv move aims to raise the cost of leaving obviously unchecked LLM output in submissions.
Context and significance
Editorial analysis: For researchers and practitioners, the clarified enforcement changes the operational risk of using LLMs in manuscript preparation. The policy makes human verification of LLM outputs an explicit compliance checkpoint for access to a central distribution channel for preprints. Reporting also shows divided community sentiment: The Decoder notes support from some researchers who view the step as protecting scientific quality, while social reporting aggregated on di.gg records concerns about draconian penalties, selective enforcement, and the possibility of abuse via falsely listed co-authors.
What to watch
For practitioners: monitor how arXiv defines and documents enforcement procedures and appeals, and watch for early test cases where moderators apply the one-year ban. Also watch for tooling and lab workflows that teams adopt to reduce hallucinations and to produce verifiable provenance (for example, automated citation checks and LLM-output audit trails). Finally, observers should track publisher and community reaction, including whether journals or preprint endorsement systems adjust gating or verification requirements in response.
Scoring Rationale
This policy clarification affects a central distribution channel for ML research and raises compliance and workflow implications for researchers. It is notable for practitioners but not a frontier technical breakthrough.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


