Google Brings AI Edge Gallery To macOS

Google released the Google AI Edge Gallery for macOS, letting Mac users run its Gemma models locally, alongside a new Gemma 4 12B model and the on-device Google AI Edge Eloquent dictation app, per Google's developer blog and 9to5mac. Google describes Gemma 4 12B as designed to bring agentic, multimodal intelligence to laptops, running on machines with about 16GB of RAM and handling text, vision, and audio. 9to5mac reports the macOS Gallery currently exposes five Gemma builds, including Gemma-4-12B-it and several Gemma-3n variants, and contrasts it with runtimes like Ollama and LM Studio that allow installing a wider set of third-party models. Google also extended its LiteRT-LM CLI with a serve command that creates a local, OpenAI-compatible endpoint for fully on-device agents and tools.
What happened
Google released the Google AI Edge Gallery for macOS, enabling local execution of its Gemma models on Macs, and introduced a new Gemma 4 12B model plus the on-device Google AI Edge Eloquent dictation app, per Google's developer blog and 9to5mac. Google describes Gemma-4-12B-it as designed to bring agentic, multimodal intelligence to laptops, running on machines with about 16GB of RAM and handling text, vision, and audio. 9to5mac reports the macOS Gallery exposes five Gemma builds: Gemma-4-12B-it, Gemma-4-E2B-it, Gemma-4-E4B-it, Gemma-3n-E2B-it, and Gemma-3n-E4B-it.
Technical details
Per Google's blog, the Gallery can generate and run Python locally for tasks such as data analysis and charting, and Eloquent adds on-device voice editing powered by Gemma 4 12B. Google also extended its LiteRT-LM CLI with a serve command that creates a local, OpenAI-compatible endpoint, letting standard tools and SDKs point at an on-device model.
Editorial analysis
Class B analysis: local models trade raw scale for on-device availability, lower latency, and reduced cloud dependency. Running a 12B-class model on a laptop typically depends on sufficient memory and on accelerator support such as Apple silicon, so practical performance varies by machine and quantization. A curated, vendor-supplied catalog differs from open runtimes like Ollama and LM Studio, which let users install a wider range of third-party models.
What to watch
- •Whether the Gallery expands beyond the initial five Gemma builds.
- •Real-world throughput and quality of Gemma-4-12B-it on Macs versus cloud-hosted models.
- •Interoperability between LiteRT-LM endpoints and existing local runtimes and agent frameworks.
Key Points
- 1Google's AI Edge Gallery now runs Gemma models locally on macOS, with a new Gemma 4 12B multimodal model targeting laptops with about 16GB of RAM.
- 2The macOS Gallery ships a curated set of five Gemma builds, unlike open runtimes such as Ollama and LM Studio that allow third-party models.
- 3A new LiteRT-LM serve command exposes a local OpenAI-compatible endpoint, easing fully on-device, privacy-preserving agent workflows.
Scoring Rationale
A notable product release that makes Google's Gemma family, including a new multimodal 12B model, runnable on-device on macOS, which matters for offline, low-latency, and privacy-preserving workflows. It is an incremental local-AI advance rather than a frontier-model milestone, placing it in the notable tier.
Sources
Public references used for this report.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems
