Models & Researchopenaigpt 5.5benchmarksanthropic

OpenAI Releases GPT-5.5, Revives GPT-4o Nostalgia

|May 1, 2026

7.2

Relevance Score

OpenAI Releases GPT-5.5, Revives GPT-4o Nostalgia — Photo: i.insider.com · rights & takedowns

OpenAI unveiled GPT-5.5 in a video briefing, with Greg Brockman saying the model "can accomplish much more with fewer instructions," according to reporting by Asiae. Asiae reports OpenAI's internal benchmarks show GPT-5.5 scored 82.7% on Terminal Bench 2.0 versus 75.1% for GPT-5.4, and 84.9% on the GDPval metric versus Anthropic's Opus 4.7 at 80.3%; OpenAI's report also shows GPT-5.5 trailing Opus 4.7 on the SWE-Bench Pro coding metric (58.6% vs 64.3%), and alleges signs of memorization in Anthropic's model, Asiae reports. Business Insider reports some former users of GPT-4o say they feel hopeful GPT-5.5 recaptures elements of GPT-4o's personality, citing a user, Martina Wanis.

What happened

Asiae reports that OpenAI introduced GPT-5.5 during a video briefing on April 23, and quoted Greg Brockman saying, "This model can accomplish much more with fewer instructions." Asiae also quotes Sam Altman during the briefing. Asiae reports OpenAI described GPT-5.5 as improving performance across coding, document writing, retrieval, data analysis, and scientific research.

Technical details

Per the performance report cited by Asiae, GPT-5.5 scored 82.7% on the Terminal Bench 2.0 metric, compared with 75.1% for GPT-5.4. Asiae further reports OpenAI's GDPval score for GPT-5.5 was 84.9%, versus 80.3% for Anthropic's Opus 4.7. On the coding-focused SWE-Bench Pro metric, Asiae reports GPT-5.5 scored 58.6%, trailing Opus 4.7 at 64.3%. Asiae reports OpenAI argued that Anthropic's model shows "signs of data memorization" in its output; Asiae attributes that characterization to OpenAI's performance report.

Industry context

Editorial analysis: Reporting by Asiae frames GPT-5.5 as part of a rapid iteration cycle at top AI labs, with benchmark head-to-heads used to shape public comparisons. Industry observers have increasingly relied on task suites like Terminal Bench and SWE-Bench Pro for apples-to-apples scoring, but these internal metrics vary by dataset, prompting caution when mapping reported percentages to real-world application performance.

User reaction and product personality

Business Insider reports a vocal group of former GPT-4o users mourned that model after its February retirement and that some now say GPT-5.5 restores aspects of GPT-4o's conversational tone; Business Insider cites a user, Martina Wanis, describing GPT-4o as "this digital thing that helped you with work, while simultaneously acting like an intelligent partner in crime."

What to watch

For practitioners: validate GPT-5.5 on your own benchmarks, especially for coding and domain-specific tasks where reported scores diverge. Monitor independent evaluations for memorization artifacts and long-form code correctness, and track how models behave in agentic, multi-step workflows where vendors claim improved autonomy.

Scoring Rationale

A notable incremental model release from a major lab with competitive benchmark comparisons to Anthropic. Practitioners should care but this is not a paradigm shift.

MoreAnthropic news

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Models & Researchopenaigpt 5.5benchmarksanthropic