Products & Toolsgooglegeminiantigravitytoken quotas

Google launches Gemini 3.5 Flash Low variant

|May 25, 2026|By LDS Team

6.8

Relevance Score

Google launches Gemini 3.5 Flash Low variant — Photo: androidauthority.com · rights & takedowns

Android Authority reports that Google introduced a new low-effort variant, Gemini 3.5 Flash (Low), to reduce token consumption for simple tasks in Google Antigravity. Android Authority quotes Google as saying the Low variant generates about 45% fewer tokens than the existing Flash variant (renamed Flash (Medium)), and that the company reset Gemini quotas across paid and free plans to ease developer pain. Google's public documentation and blog posts describe Gemini 3.5 Flash (gemini-3.5-flash) as generally available and optimized for agentic execution and coding, with a 1M-token context window and 65k max output tokens (per Google's developer docs). 9to5Google previously reported sharp usage limits in Antigravity that prompted Google to raise quotas multiple times. For practitioners this is an operational change that affects cost and rate-limit planning for agentic coding workflows.

What happened

Android Authority reports that Google introduced Gemini 3.5 Flash (Low) as a lower-token variant intended to reduce token usage on simple tasks in Google Antigravity. Android Authority reports Google saying the Low variant generates around 45% fewer tokens than the previous Flash release, which the outlet describes as renamed to Flash (Medium). Android Authority also reports that Google reset Gemini quotas across both paid and free plans to address developer complaints about tight usage limits.

What Google has published

Per Google's developer documentation, Gemini 3.5 Flash is listed as generally available and exposed as model ID gemini-3.5-flash for the generateContent API. The docs state the Flash family supports a 1,000,000-token context window and 65,000 max output tokens and discuss agentic and coding optimizations. Google's product and DeepMind blog posts describe Gemini 3.5 Flash as targeted at agentic execution and high-throughput coding workloads and note co-optimization with the Antigravity harness.

Editorial analysis - technical context

Industry-pattern observations: introducing a lower-output variant to reduce token volume is a common product response when customers hit rate or quota friction, because shorter outputs reduce both cost and quota consumption without changing underlying model capability. For developers running iterative coding loops or agentic subagents, output length often dominates token consumption; a model variant tuned for terser outputs can materially extend usable quota during long workflows.

Context and significance

reporting that Google reset quotas and added a Low variant matters because agentic developer platforms-like Google Antigravity-amplify token use through repeated plan-and-execute cycles. Public documentation showing a 1M-token context window and large max-output capability positions Flash as a high-capacity model, but real-world usage patterns still create operational constraints (rate-limits, cost). The reported 45% token reduction for the Low variant is a practical lever for teams who need longer interactive sessions from the same quota.

What to watch

Editorial analysis

observers and practitioners should monitor three things. First, whether Google publishes objective token-per-task benchmarks comparing gemini-3.5-flash (Low) to Flash (Medium) on common SWE prompts. Second, whether downstream SDKs and Antigravity tooling expose explicit brevity/verbosity knobs so teams can trade output richness for token cost. Third, whether other provider ecosystems respond with low-output variants or per-call brevity features to address similar quota friction.

Bottom line

Reporting shows Google deployed a lower-output Flash variant and adjusted quotas after developer complaints; the move reduces token usage at the model-output level and is immediately relevant to teams optimizing cost and rate-limit planning for agentic coding workflows.

Key Points

1Industry observation: Adding a lower-output model variant reduces token consumption and extends practical quota for iterative agentic workflows.
2Industry observation: Token usage often spikes from iterative loops and agent subagents, so model-level brevity can be more effective than per-call limits.
3Industry observation: Providers publishing clear token-per-task benchmarks and verbosity controls make quota planning easier for engineering teams.

Scoring Rationale

This is a notable product change that directly affects developers using agentic coding tools: a lower-token variant and quota resets change cost and rate-limit planning. It is not a frontier model breakthrough, but it has immediate operational impact for teams using Antigravity and the Gemini API.

MoreGoogle AI news

Sources

Primary source and supporting public references used for this report.

7 sources

Primary sourceandroidauthority.comGoogle's latest attempt to fix token quotas is here: Say hello to Gemini 3.5 Flash Low

View 6 more sources

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Active Search Campaigns by BudgetEasy

High CPC Clicks & Poor Landing PagesMedium

Campaign ROAS by Attribution ModelHard

250 free problems · No credit card

See all Ad Tech problems