Researchllmcontent moderationanthropicmodel safety

ADL Rates LLMs On Antisemitic Moderation

||By LDS Team
8.2
Relevance Score
ADL Rates LLMs On Antisemitic Moderation
Photo: The Verge · rights & takedowns

The Anti-Defamation League published a study Wednesday evaluating six large language models — Anthropic Claude, OpenAI ChatGPT, Meta Llama, Google Gemini, DeepSeek, and xAI Grok — on handling anti-Jewish, anti-Zionist, and extremist prompts across 4,181 chats per model (over 25,000 chats) between August and October 2025. Claude scored highest (80) while Grok scored lowest (21), revealing substantial moderation gaps and multimodal weaknesses, especially in image and document analysis.

Key Points

  • 1Ranked six LLMs; Claude scored 80 and Grok scored 21, a 59-point performance gap
  • 2Showed consistent weaknesses in multi-turn dialogue and image/document analysis reducing moderation effectiveness
  • 3Indicates developers and vendors must improve multimodal safety, context retention, and bias detection for deployment

Scoring Rationale

Robust empirical evaluation across six major LLMs provides strong evidence, limited by lack of novel mitigation guidance.

Sources

Public references used for this report.

2 sources

Practice with real Logistics & Shipping data

90 SQL & Python problems · 15 industry datasets

250 free problems · No credit card

See all Logistics & Shipping problems