Study finds LLMs reproduce antisemitic stereotypes
A peer-reviewed paper, "From Myth to Model: Representation of 'the Jew' in Generative AI," by Israeli psychologists Michael Gilead (Tel Aviv University) and Gal Gutman (Ben-Gurion University), reports that major generative models reproduce centuries-old antisemitic tropes. Published in American Psychologist (May-June 2026), the study used a chain-of-association methodology to generate 252 fictional biographies via ChatGPT-4 Turbo, then stripped religious markers and had both AI models and 378 human raters assess characters on personality and social traits. The Jerusalem Post and The Times of Israel report that model outputs associated Jewish names with higher competence, privilege, dominance, and obsession but lower warmth, likability, and collectivism - a profile consistent with historical antisemitic stereotypes. Results were replicated on DeepSeek and Mistral. The findings have direct implications for LLM auditing in high-stakes domains.
What happened
A peer-reviewed paper titled "From Myth to Model: Representation of 'the Jew' in Generative AI," authored by Michael Gilead (Tel Aviv University) and Gal Gutman (Ben-Gurion University) and published in American Psychologist (May-June 2026, Vol. 81, No. 4), reports that widely used generative models reproduce longstanding antisemitic stereotypes (PubMed; The Times of Israel). The paper appears in a special issue of American Psychologist focused on antisemitism, which the journal described as a "long-overdue reengagement" between psychology research and prejudice against Jews (The Times of Israel).
Methodology
The study focused on ChatGPT-4 Turbo and used a multi-step indirect approach to expose latent group representations without triggering the model's explicit bias controls. The researchers instructed the model to generate 252 names for Jewish and non-Jewish Americans - men and women aged 18 to 80 - yielding 126 names in each category. For each name, the model produced a 100-word fictional biography written from the perspective of a novelist selecting names that correspond with specific character traits (The Jerusalem Post; The Times of Israel).
Researchers then stripped names and religious identifiers from the biographies, and had both AI models and 378 human raters evaluate the anonymized characters on dozens of personality and social traits. The ratings focused on two central dimensions from prior stereotype research: warmth (perceived intent, friendliness, likability) and competence (perceived capability, intelligence, success) (The Times of Israel).
Core findings
Model-generated characters associated with Jewish names were consistently rated as more competent, privileged, dominant, assertive, efficient, and obsessive-compulsive, while rated lower on warmth, friendliness, likability, and collectivism (The Jerusalem Post; The Times of Israel). This places Jewish-named characters in the high-competence, low-warmth quadrant - a profile that prior social psychology research associates with envy, perceived threat, and dehumanization rather than belonging.
In a further step, the researchers converted the trait profiles into narrative descriptions and asked AI models to identify fictional characters matching those profiles. ChatGPT named Tyrion Lannister, Walter White, and Michael Corleone - archetypes the researchers described as "master manipulators," embodying an isolated, powerful, morally ambiguous "puppet master" trope historically central to antisemitic propaganda (The Times of Israel).
Replication
The findings were replicated on DeepSeek and Mistral, suggesting the effect is not specific to OpenAI's model family. Both AI models, along with the human raters, identified the same trait asymmetry (The Jerusalem Post; The Times of Israel).
Researchers' framing
The authors wrote: "LLMs, trained on massive corpora of human-generated content, may have identified and encoded such cultural templates... Traits that appear benign, or even admirable, in isolation can, through combination and context, reconstitute historical prejudices in subtler, more insidious forms" (The Times of Israel). They further observed that historical antisemitic discourse has frequently portrayed Jews as agents of disruption undermining social cohesion - and that this association "persists and may now be encoded in LLMs" (The Jerusalem Post).
For practitioners
Editorial analysis: The study introduces a methodology - generating identity-linked outputs, removing explicit markers, and using independent raters to assess trait patterns - that teams can adapt for subgroup bias audits. The authors suggest aggregate fairness metrics are insufficient for detecting stereotype encoding that survives anonymization. The implications extend to any high-stakes domain where LLMs influence decisions, including hiring, lending, and educational assessment, where latent trait associations can have real discriminatory consequences.
What to watch
Editorial analysis: Follow-on questions include whether this effect replicates across additional model families, whether model providers publish subgroup-specific bias evaluations in response, and whether emerging regulatory frameworks begin requiring targeted stereotype audits alongside standard fairness assessments. The special-issue framing in American Psychologist also suggests renewed academic attention to this class of bias in the near term.
Scoring Rationale
A peer-reviewed study in a major psychology journal demonstrating that GPT-4 Turbo, DeepSeek, and Mistral encode antisemitic stereotypes through latent trait associations - with replication across models and 378 human raters - is a methodologically significant contribution to LLM fairness and auditing practice. The finding that explicit bias controls do not suppress this class of stereotype encoding is directly actionable for practitioners in high-stakes deployment contexts.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
