Models & Researchopenaigpt 5.5model behaviorsafety

OpenAI restricts models from mentioning goblins and similar creatures

|May 4, 2026

6.9

Relevance Score

OpenAI restricts models from mentioning goblins and similar creatures

OpenAI documented a surprising lexical quirk in its recent models, reporting in an April 29, 2026 blog post that use of the word "goblin" in ChatGPT rose by 175% after the GPT-5.1 launch and that "gremlin" rose by 52%. The company published a post titled "Where the goblins came from" explaining that a high reward signal in the system used for the "Nerdy" personality amplified creature metaphors and that later training spread the tic to other contexts. Separately, the Codex CLI code release contains a system prompt, reported by Ars Technica and Wired, that explicitly instructs the model to "never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query." Editorial analysis: This episode highlights how narrow reward signals and personality conditioning can create persistent stylistic tics that leak across models and toolchains.

What happened

In an April 29, 2026 blog post titled "Where the goblins came from," OpenAI reported that references to goblins and similar creatures increased after the launch of GPT-5.1, with use of the word "goblin" in ChatGPT rising by 175% and "gremlin" by 52%, according to the post. The company traced the root cause to reinforcement signals used when training a "Nerdy" personality, which rewarded metaphors involving creatures and allowed the style tic to generalize across later training stages, per the blog. OpenAI said the Nerdy personality was discontinued in March, but some creature-language persisted in other systems that had been trained before the root cause was identified. Separately, the Codex CLI code released by OpenAI contains a base system prompt that, as reported by Ars Technica and Wired, repeats an explicit instruction: "never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query." Several news outlets including The Verge, Wired, Ars Technica, and the Wall Street Journal covered the story and the public spread of the discovery into memes and developer tests.

Editorial analysis - technical context

Reinforcement learning and preference-tuning steps can amplify small lexical patterns when those patterns receive above-average reward. Industry-pattern observations: When a behavior is rewarded in a narrowly scoped training condition, later supervised fine-tuning or reuse of preference data can allow the behavior to leak into other modes and products. For practitioners, this is a concrete example of how reward shaping in personality or role-conditioned training can create persistent, hard-to-localize output distributions. The Codex example also shows that operational mitigations frequently take the form of guardrail system prompts; those prompts can be explicit and repetitive when the unwanted behavior is lexical and easy to enumerate.

Context and significance

Industry context

The incident is notable because it combines three common production factors: multi-personality instruction tuning, reinforcement-based preference signals, and model reuse across toolchains (for example, conversational ChatGPT and the Codex coding agent). Public reporting frames the event as a teachable case about emergent style tics rather than a safety exploit or data leak. For teams deploying LLMs, the story underscores the limits of treating personalities as fully isolated training conditions; models can internalize and propagate stylistic quirks when training artifacts are reused.

Operational details from reporting

•Per OpenAI's blog post, the initial spike was detected after GPT-5.1 and worsened across subsequent model generations, prompting internal analysis. Those numerical changes, 175% for "goblin" and 52% for "gremlin", are reported by OpenAI in the post.
•Ars Technica and Wired reported that the Codex CLI's base instructions include the quoted prohibition against creature language, and that the prohibition appears only in the latest model instructions rather than earlier versions.
•Journalistic coverage from The Verge, Wired, Ars Technica, and the Wall Street Journal documented community reactions, meme creation, and developer attempts to override or probe the anti-goblin clause.

For practitioners

Editorial analysis: Teams building or fine-tuning models should monitor low-frequency lexical shifts and stylistic modes as part of validation, not just aggregate metrics. Observed patterns in similar transitions: small, repeated reward biases can amplify idiosyncratic phrasing; targeted testing for metaphors, jokes, or domain-specific jargon often reveals leakage earlier than standard benchmarks. Operational mitigations observed here include removing the problematic personality, applying specific system-prompt guardrails, and instrumenting production telemetry for sudden word-frequency changes.

What to watch

•Whether the explicit system prompts used as a stopgap are maintained, hardened, or replaced with data-level corrections in future fine-tuning cycles, as tracked in future OpenAI blog posts or code releases.
•Signals in telemetry for other lexical spikes following personality-based tuning, which would suggest the phenomenon is broader than this incident.
•How other vendors and open-source projects handle similar leakage between personality-conditioned training and general-purpose deployments; coverage in trade press or engineering blogs will indicate if this becomes a wider operational concern.

This account synthesizes OpenAI's April 29, 2026 blog post "Where the goblins came from" and reporting from Ars Technica, Wired, The Verge, the Wall Street Journal, and other outlets. The write-ups collectively document the observed lexical spike, the attribution to reinforcement signals in a "Nerdy" personality, the Codex CLI guardrail prompts, and the developer community's rapid, playful engagement with the artifact.

Scoring Rationale

The story is a notable operational case study for ML practitioners: it shows how reinforcement signals and personality tuning can create persistent, surprising model behaviors that require engineering mitigations. It is not a frontier research breakthrough, but it matters for deployment and monitoring practices.

MoreOpenAI news

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems