Meta's Watermelon Matches GPT-5.5 Benchmarks
Meta's superintelligence chief Alexandr Wang told employees in a town hall that the company's upcoming model, codenamed Watermelon, has "caught up" with OpenAI's GPT-5.5 on closely followed AI benchmarks, according to Business Insider, which cited two people familiar with the matter. Wang reportedly said Watermelon is still in training and uses "an order of magnitude more compute" than Muse Spark (Meta's April model, internally codenamed Avocado), which had trailed rival models despite solid benchmark scores. Business Insider notes it was not clear which benchmarks Wang cited, and neither Meta nor OpenAI has confirmed the claim. For practitioners, an internal, single-sourced benchmark claim is not equivalent to a published, reproducible evaluation and should be treated as an early signal, not a verified result, until Meta releases the model publicly.
An unconfirmed internal benchmark claim from Meta's AI leadership is a reminder that town-hall statements are not evaluation artifacts: until Meta publishes reproducible results or a model card for Watermelon, "caught up with GPT-5.5" is a single-sourced assertion, not verified parity. For practitioners tracking the frontier-model race, the more concrete signal here is the compute trajectory Wang described, not the benchmark claim itself.
What happened
According to Business Insider, Alexandr Wang told Meta employees in a town hall that the company's upcoming model, codenamed Watermelon, "has caught up" with OpenAI's GPT-5.5 based on closely followed AI benchmarks, citing two people familiar with the matter. Business Insider reports Wang said Watermelon, the successor to Avocado (Meta's internal codename for Muse Spark), is "currently in training" and "uses an order of magnitude more compute than Avocado." OpenAI released GPT-5.5 in April and introduced GPT-5.6 late last month, per Business Insider. Meta declined to comment and OpenAI did not respond to a request for comment. Investing.com, redistributing the Business Insider report, added that it was not immediately clear which benchmarks Wang was citing.
Technical context
Meta released Muse Spark in April 2026, its first major model since hiring Wang, and it performed well on some benchmarks while still falling short of leading rivals overall. Wang's description of Watermelon using "an order of magnitude more compute" than Muse Spark points to continued aggressive scaling as Meta's primary lever, consistent with the company's reported multibillion-dollar spending on chips and data centers under Zuckerberg's direct oversight of AI development.
For practitioners
Treat this as a leading indicator, not a procurement signal. Internal benchmark claims announced without published methodology, evaluation datasets, or third-party replication carry a real risk of optimistic framing. Wait for a public model card, an official benchmark table, or independent evaluations before factoring Watermelon into model-selection or capacity-planning decisions.
What to watch
Meta has not given a release timeline for Watermelon. Watch for a public launch announcement, published benchmark results, and whether the model narrows the gap with GPT-5.5 and GPT-5.6 on independently run evaluations rather than internally cited ones.
Key Points
- 1Meta's AI chief told staff Watermelon has matched GPT-5.5 on internal benchmarks, per a single Business Insider report citing anonymous sources.
- 2Wang described Watermelon as using far more training compute than April's Muse Spark, underscoring compute scaling as Meta's core strategy.
- 3Practitioners should wait for published benchmarks or independent evaluations before treating the parity claim as verified for deployment decisions.
Scoring Rationale
Notable signal in the Meta-OpenAI frontier-model race given Meta's competitive stakes, but the claim rests on a single anonymous-sourced town-hall statement with no published benchmark data, and neither company confirmed specifics, so it stays provisional pending independent verification.
Sources
Public references used for this report.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems

