OpenAI releases GPT-5.5 as ChatGPT default

OpenAI announced GPT-5.5 in an April 23, 2026 blog post and is deploying GPT-5.5 Instant as ChatGPT's new default, according to OpenAI's post and reporting by The Verge. The Verge reports that OpenAI's internal evaluations showed GPT-5.5 Instant produced 52.5% fewer hallucinated claims than its GPT-5.3 Instant baseline on "high-stakes" prompts covering medicine, law, and finance, and 37.3% fewer inaccurate claims on user-flagged challenging conversations. OpenAI's announcement also highlights stronger benchmark results, with Terminal-Bench 2.0 at 82.7% for GPT-5.5 versus 75.1% for GPT-5.4, and says the model improves everyday tasks like coding, data analysis, and tool use. OpenAI wrote that GPT-5.5 is rolling out to Plus, Pro, Business, and Enterprise users and that API availability will follow pending additional safeguards.
What happened
OpenAI announced GPT-5.5 on April 23, 2026 in a company blog post describing the model as the next step toward "a new way of getting work done on a computer," and later updated availability notes on April 24, 2026. According to OpenAI's blog, GPT-5.5 delivers higher performance on benchmarks and practical tasks such as coding, research, data analysis, and multi-step tool use. The Verge reports that OpenAI's internal evaluations found GPT-5.5 Instant produced 52.5% fewer hallucinated claims than GPT-5.3 Instant on "high-stakes" prompts in medicine, law, and finance, and 37.3% fewer inaccurate claims on conversations users had flagged for factual errors. OpenAI's blog lists Terminal-Bench 2.0 at 82.7% for GPT-5.5 versus 75.1% for GPT-5.4, and the company wrote that GPT-5.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT, with API deployments to follow under additional safeguards.
Technical details
Editorial analysis - technical context: Public materials from OpenAI emphasize capability improvements in "agentic" workflows, meaning multi-step tasks that require planning, tool use, and iterative checking. OpenAI's blog positions GPT-5.5 as matching GPT-5.4 per-token latency while increasing efficiency on coding tasks by using fewer tokens, claims that are presented alongside benchmark score jumps. Independent public benchmarks and replication tests have not been published in the scraped coverage; the reported performance metrics are either from OpenAI's published benchmark table or from internal evaluations cited by The Verge.
Context and significance
Industry context
A model upgrade that becomes the default for ChatGPT matters because it changes the baseline behavior millions of users - and many products built on ChatGPT - experience. Reported reductions in hallucinations on "high-stakes" prompts would be particularly relevant for workflows touching medicine, law, and finance, where factual accuracy is essential. However, the hallucination reductions cited in The Verge are described as coming from OpenAI's internal evaluations, and the scraped sources do not include independent third-party validation or detailed methodology for those internal tests.
What to watch
For practitioners: Observers should track independent benchmark runs, red-team reports, and API release notes that describe the additional safeguards OpenAI references for server-side deployments. Indicators to watch include independent replication of the 52.5% and 37.3% figures, changes in latency and token efficiency under real workloads, and how the "memory sources" feature (which OpenAI says will show context used for personalized responses) affects traceability of outputs. Also watch the three-month transition window The Verge reports for GPT-5.3 Instant remaining available, which will let product teams test migration paths and fallbacks.
Editorial analysis: If the reported reductions in hallucination hold up under external testing, this release could lower the immediate friction for adopting large models in higher-stakes knowledge work. At the same time, reported gains based on internal evaluations are common in early product announcements, so independent validation and transparency about evaluation methodology will determine how broadly practitioners adjust risk models and deployment plans.
Scoring Rationale
A new default model for ChatGPT with reported large reductions in hallucinations has meaningful operational impact for practitioners and enterprises. The score reflects the model-release importance balanced by reliance on internal evaluations and pending independent validation.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
