Top AI Models Receive Barely Passing Safety Grades

In 2025, a safety evaluation reported by Mashable finds leading models — Google's Gemini, Anthropic's Claude, and OpenAI's ChatGPT — only marginally meet basic safety standards, each earning roughly a C average. Adversarial red-teaming exposed vulnerabilities to coded prompts, bias, and multimodal inconsistencies, prompting companies to accelerate safety investments as these models are increasingly used in high-stakes sectors like finance and healthcare.
Key Points
- 1Assigns C-average safety grades to Gemini, Claude, and ChatGPT in 2025 assessment
- 2Reveals vulnerabilities to coded prompts, multimodal inconsistency, and bias under adversarial red-teaming
- 3Signals need for stronger guardrails, continuous monitoring, and industry collaboration for high-stakes deployments
Scoring Rationale
Industry-wide safety weaknesses increase urgency; constrained novelty and reliance on secondary reporting limit definitive conclusions.
Sources
Public references used for this report.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problems
