ChatGPT Corrects 'strawberry' R-count but Errors Persist

9to5Google reports that OpenAI posted that, "at long last," ChatGPT can correctly count the number of letter "R" occurrences in the word "strawberry." The article says the same coverage also highlights other persistent confident mistakes, including examples where the model reportedly miscounts the letter "R" in words like "cranberry" and where it previously recommended walking to a car wash 50 meters away despite the illogical prompt. 9to5Google frames the recent fix as likely hardcoded and notes many replies to OpenAI's post pointing out additional failures. The piece presents the update as incremental: a narrow correction rather than a resolution of the broader problem of LLM confident errors, as reported by 9to5Google.
What happened
9to5Google reports that OpenAI posted that "at long last," ChatGPT can correctly answer how many times the letter "R" appears in the word "strawberry." 9to5Google says the same coverage documents other examples of so-called "confident mistakes," including instances where ChatGPT reportedly replies that "cranberry" has one "R" and where the model previously recommended walking to a car wash 50 meters away after a logically malformed prompt.
Editorial analysis - technical context
Companies working with large language models commonly face persistent "confident mistake" behavior, also called hallucinations, where the model asserts incorrect facts with high confidence. Industry-pattern observations: teams often deploy targeted, hardcoded patches or prompt-layer rules to fix high-visibility failures; these fixes can close individual test cases but do not always generalize across similar inputs or underlying failure modes.
Context and significance
Industry context: for practitioners, the ChatGPT example underlines two predictable patterns, first, surface-level correctness improvements can be implemented quickly and publicized; second, a constellation of related edge cases typically remains. That combination matters for product teams and evaluators because it affects trust metrics, test coverage design, and escalation criteria for model outputs in user-facing flows.
What to watch
Observers should track whether OpenAI publishes technical notes, a changelog, or unit test coverage for the fix, and whether community reports of related miscounts decline on developer and social channels. Industry-pattern observations: durable remediation is usually visible when a fix survives diverse paraphrases and adversarial rephrasings rather than only canonical prompts.
Scoring Rationale
The story is notable for practitioners because it highlights model reliability and the limits of narrow fixes, but it is not a platform-changing release. It signals ongoing operational challenges rather than a major technical breakthrough.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

