OpenAI Explains Why ChatGPT Mentioned Goblins

According to OpenAI's blog post, starting with GPT-5.1 the models began referencing goblins, gremlins and similar creatures much more often in responses. OpenAI reports that use of the word "goblin" rose by 175% after the GPT-5.1 launch and that "gremlin" rose by 52% (OpenAI blog). The company traced the cause to a retired Nerdy personality, where its internal prompt and training rewarded creature-based metaphors, and to reinforcement signals that propagated the tic across training data (OpenAI; reporting in Android Authority, CNET, Gizmodo). OpenAI says it removed the reward signal, retired the Nerdy personality in March with GPT-5.4, and added an explicit instruction in GPT-5.5 and Codex to avoid mentioning goblins, gremlins, raccoons, pigeons, trolls or similar creatures unless absolutely relevant (OpenAI blog; GitHub/Codex prompt).
What happened
According to OpenAI's blog post, beginning with GPT-5.1 the company observed an unusual uptick in references to "goblins," "gremlins," and related creature words in ChatGPT outputs. OpenAI reports that use of the word "goblin" rose by 175% after the GPT-5.1 launch and that "gremlin" increased by 52% (OpenAI blog). Internal analysis linked the pattern to responses generated under a preconfigured Nerdy personality, which included a system prompt instructing the model to be "an unapologetically nerdy, playful and wise AI mentor" and to "undercut pretension through playful use of language" (OpenAI blog). OpenAI says reinforcement signals unintentionally gave higher rewards to metaphors that used creature language, and that those rewarded outputs were reused during later supervised and preference training, allowing the tic to spread across generations and contexts (OpenAI blog; reporting in CNET and Android Authority). OpenAI retired the Nerdy personality around the GPT-5.4 release and, per the Codex GitHub prompt, added a specific instruction in GPT-5.5 and Codex to avoid mentioning goblins, gremlins, raccoons, pigeons, trolls, ogres or similar animals unless "absolutely and unambiguously relevant" (OpenAI blog; Codex prompt on GitHub).
Technical details
According to OpenAI's writeup, the behavior arose from the interaction of human preference signals and system prompts used during reinforcement learning and supervised fine-tuning. OpenAI characterizes the root cause as "many small incentives," adding that "we unknowingly gave particularly high rewards for metaphors with creatures," which amplified creature references in subsequent training steps (OpenAI blog). The company used its coding agent, Codex, to search training outputs for creature-language examples and adjusted filtering and reward weights to reduce prevalence (OpenAI blog; Gizmodo reporting). The explicit negative instruction that appeared in GPT-5.5's Codex-related prompts enumerates creature words and prohibits their use unless strictly relevant, reflecting an operational mitigation embedded in system-level prompts (GitHub/Codex prompt as reported by Wired, Gizmodo, and others).
Industry context
Editorial analysis: This episode illustrates a recurrent pattern in large-model development where localized reward shaping or style prompts produce emergent linguistic tics that can propagate when their outputs enter downstream training corpora. Companies building production models often mix supervised examples, preference data, and style-conditioning, and industry observers note that rewarded stylistic signals can be reintroduced across training stages, causing non-obvious behavior to amplify (reporting across CNET, Android Authority, Gizmodo; industry-pattern observations). For practitioners, the case underscores how system prompts and human feedback interact with data reuse to create brittle behavioral artifacts.
Context and significance
Editorial analysis: The goblin incident is low risk in user safety terms but high signal for model engineering practices. It is a concrete example showing how reward-function design and prompt engineering can have outsized lexical effects. The rapid detection and public writeup by OpenAI provide a rare, transparent postmortem that data scientists and ML engineers can study to see how evaluation, human feedback pipelines, and example reuse can introduce and propagate unwanted tics.
What to watch
Editorial analysis: Observers should track whether other vendors publish similar postmortems and whether teams adopt tooling to detect and quarantine high-reward style tics before they enter supervised or preference datasets. Key indicators include spikes in token-level usage for narrow lexical sets during canary releases, audit tooling that links specific reward signals to output patterns, and whether best-practice guidance emerges for separating stylistic system prompts from training-feedback signals. Researchers and engineers can also watch for wider adoption of automated corpus auditing (searching for overrepresented metaphors or lexicons) and for frameworks that version-control reward models and style prompts separately from base-model weights.
Bottom line
According to OpenAI, the goblin problem was an emergent artifact of reward signals tied to a retired Nerdy personality and propagated via downstream training. OpenAI removed the reward, retired the personality, filtered training data, and added explicit prohibitions in GPT-5.5/Codex prompts to limit recurrence (OpenAI blog; reporting in Android Authority, CNET, Gizmodo, NBC).
Scoring Rationale
The story is notable because it reveals a concrete failure mode in RLHF and prompt-conditioned training that matters to practitioners. It is not a paradigm shift, but the public postmortem and concrete mitigation steps make it a useful case study for ML teams.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


