Skip to content

Google's Flash Model Just Beat Its Own Flagship. The Real Target Is Your Agents.

DS
LDS Team
Let's Data Science
7 min
At Google I/O on May 19, Gemini 3.5 Flash launched outperforming Gemini 3.1 Pro on nearly every benchmark while running four times faster. Sundar Pichai pitched it at less than half the price of frontier models, and the pitch was aimed at engineers whose token bills are already out of control.

On stage at Google I/O on Tuesday, a DeepMind engineering director named Varun Mohan did not show off a chatbot. He showed off a swarm. Inside Antigravity, Google's coding platform, agents spun off to work on separate components of a software project, then came back together. By the end of the demo, they had built a full operating system from scratch.

That demo was the entire point of the day. Google used its biggest developer event of the year to argue that the next wave of useful AI is not the thing you chat with. It is the thing you hand a task to and walk away from.

The model powering that argument is Gemini 3.5 Flash, and the surprise is buried in the name. Flash models are supposed to be the cheap, fast, slightly-dumber tier. This one launched beating Google's own flagship.

The Cheap Model Outscored the Expensive One

"3.5 Flash offers an incredible combination of quality and low latency," Koray Kavukcuoglu, DeepMind's chief technologist, told reporters ahead of the launch. "It outperforms our latest frontier model, 3.1 Pro, on nearly all the benchmarks," including coding, agentic tasks, and multimodal reasoning.

Speed is the headline number. Kavukcuoglu said 3.5 Flash runs four times faster than other frontier models, and that Google built an optimized version that hits 12 times faster at the same quality. For a single question typed into a chat box, that speed is a nicety. For an agent firing thousands of model calls across a long-running job, it is the difference between a workflow that finishes and one that times out.

That is why Google co-developed the model alongside Antigravity, so agents would have what Kavukcuoglu called a "native environment where they can live, work, and execute." The model can run autonomously for hours. Tulsee Doshi, Google's senior director and head of product, said it will still pause and ask for input when it hits a decision that needs human judgment, such as a permissions issue.

Gemini 3.5 Flash is already the default model in the Gemini app and in AI Mode in Google Search worldwide. It also runs Gemini Spark, a new 24/7 personal agent that works in the background, plugs into Gmail and Calendar through MCP connectors, and briefs users each morning on what is waiting for them.

Google Is Selling Price as Much as Intelligence

Alphabet CEO Sundar Pichai spent his keynote time on a number that matters more to finance departments than to researchers. "Flash 3.5 delivers frontier level capabilities at less than half the price, and in some cases a third of the price," he said.

The framing was deliberate. Enterprises have spent 2026 discovering that agents are expensive to run, because every reasoning step and every tool call burns tokens. "We've heard that many companies are already blowing through their annual token budgets, and it's only May," Pichai said. His pitch: route the heavy reasoning to a big model and the grunt work to Flash, and the bill drops.

Google backed the message with a new pricing tier. The AI Ultra subscription now starts at $100 a month, with five times the Antigravity usage limit of the cheaper AI Pro plan. New and existing subscribers also get matching bonus Antigravity credits that kick in after they hit their quota, an offer that expires May 25.

Under the hood, Google says the new models were trained on its latest TPUs using a distributed setup that can train larger models in weeks instead of months. The speed of the model and the speed of building it are the same strategic story.

Antigravity Becomes the Place Agents Live

The model launch came wrapped inside a platform launch. Google released Antigravity 2.0, a standalone desktop application built around orchestrating teams of agents rather than editing files one at a time, and surrounded it with new ways to plug in.

  • Antigravity 2.0 desktop app runs multiple agents in parallel, spins up dynamic subagents, and schedules tasks to run in the background.
  • Antigravity CLI brings the same agents to the terminal, and Google is asking Gemini CLI users to migrate to it.
  • Antigravity SDK exposes the same agent harness that powers Google's own products, so teams can define custom agents and host them on their own infrastructure.
  • Managed Agents in the Gemini API let a single API call spin up an agent that reasons, uses tools, and runs code inside an isolated Linux environment, with state that persists across calls.

For practitioners, the Managed Agents piece is the quiet standout. It means the agent loop you would normally build yourself with function calling and tool use now ships as a hosted primitive. The harness is the same one behind Mohan's operating-system demo, co-optimized with Gemini 3.5 Flash.

The competitive read is obvious. Google is lining Antigravity up directly against Anthropic's Claude Code and OpenAI's rebuilt Codex, the two tools that have defined agentic coding so far. Google only launched Antigravity six months ago, which makes the all-in pivot more striking.

The Risks Google Acknowledged, and the Ones It Did Not

Putting autonomous agents in front of hundreds of millions of consumers carries obvious hazards, and Google was not entirely quiet about them. The company says Gemini 3.5 strengthened its cyber and CBRN (chemical, biological, radiological, and nuclear) safeguards, and is better calibrated to engage with sensitive questions rather than refuse them outright. That posture is shadowed by a pending lawsuit Google faces after a man died by suicide last year following weeks of conversations with Gemini.

There is also the bill. The same token math that makes Flash attractive cuts the other way for Google's flashier launches. Gemini Omni, the new multimodal world model that generates video, is the kind of feature that, as one Constellation Research analyst put it, "will kill your token budget." Always-on agents like Spark raise the same question at a different scale: what does it cost, in compute and in dollars, to run an assistant that never sleeps? Google has not said.

And the benchmark win comes with an asterisk Google did not dwell on. A faster, cheaper model that tops the previous flagship is a real achievement, but Gemini 3.5 Pro, the model meant to actually push the reasoning frontier, does not arrive until next month. Until it does, the most capable Gemini is a model optimized for throughput, not depth.

The Bottom Line

Google did not come to I/O to win a benchmark fight over the smartest model. It came to change the unit of value from intelligence to useful work completed per dollar. Gemini 3.5 Flash beating Gemini 3.1 Pro is not an accident or an embarrassment. It is the thesis: most real tasks do not need the biggest brain, they need a fast one that can run a thousand times in a loop without bankrupting you.

For engineers, the practical takeaway is concrete. The default model in your Google stack just got faster and cheaper, the agent harness behind Google's own demos is now an API call away, and the coding tool wars just gained a third serious entrant. The open question is the one Pichai raised himself and then declined to fully answer. If agents are cheap per token but never stop running, does the budget actually go down, or does it just move to a bigger spreadsheet?

When Gemini 3.5 Pro ships next month, Doshi said, the two models are designed to run together: Pro as the planner, Flash as the fleet of subagents doing the work. The chatbot era was about one model answering one person. Google is betting the next era is about one person commanding many models. The demo built an operating system. The bet is that you will build the next one.

Sources

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

250 free problems · No credit card

See all Ad Tech problems
Free Career Roadmaps8 PATHS

Step-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

Explore all career paths