Apple Reframes Siri with AFM 3 at WWDC 2026

Apple published a technical overview of its third-generation foundation-model family (AFM 3) at WWDC 2026, announcing five models across on-device and cloud tiers, per Apple Machine Learning Research. The family includes two on-device models: AFM 3 Core (3-billion-parameter dense) and AFM 3 Core Advanced (20-billion-parameter sparse, activating just 1-4B parameters via Instruction-Following Pruning). Three server-based models run on Private Cloud Compute: AFM 3 Cloud, ADM 3 Cloud for image generation, and AFM 3 Cloud Pro, which runs on NVIDIA GPUs within Google Cloud while retaining Private Cloud Compute privacy guarantees (Apple ML Research). In side-by-side human evaluations, Apple reports AFM 3 Cloud was preferred on 64.7% of prompts versus 8.7% for the 2025 server baseline. Bank of America characterised the announcements as a 'material positive reset' for Apple's AI ambitions, per Seeking Alpha.
What happened
Apple Machine Learning Research published "Introducing the Third Generation of Apple's Foundation Models" on June 8, 2026, outlining the architecture and capabilities of the AFM 3 family unveiled at WWDC 2026. The family consists of five models spanning on-device and server-based inference, all designed to power the next generation of Apple Intelligence across iOS, iPadOS, and macOS.
Model architecture
The on-device tier includes two models (Apple ML Research). AFM 3 Core is a next-generation 3-billion-parameter dense model. AFM 3 Core Advanced is Apple's most capable on-device model: a 20-billion-parameter sparse model built on Instruction-Following Pruning (IFP). Rather than loading the entire model into DRAM, AFM 3 Core Advanced stores parameters in NAND flash and routes per-prompt loading into DRAM, activating just 1 to 4 billion parameters at a time depending on task complexity. Apple says this allows the model to run on consumer hardware while scaling well beyond traditional DRAM limits (Apple ML Research).
Server-based models and cloud architecture
The server tier runs on Private Cloud Compute. AFM 3 Cloud is the server-side generalist, optimized for speed and multimodal reasoning. ADM 3 Cloud (Image) handles image generation and editing including Image Playground and Spatial Reframing in Photos. AFM 3 Cloud Pro is Apple's most capable server model for complex reasoning and agentic tool use; it runs on NVIDIA GPUs within Google Cloud, extending Private Cloud Compute to GPU infrastructure (Apple ML Research). Per coverage citing Craig Federighi, Apple's models are trained by Apple and refined using outputs from Gemini frontier models, but do not use Gemini's infrastructure for inference.
Evaluation results
Apple reported side-by-side human evaluation results comparing AFM 3 models to 2025 baselines (Apple ML Research). AFM 3 Core was preferred over its predecessor on 45.6% of prompts versus 23.3% for the prior generation. AFM 3 Cloud was preferred on 64.7% of prompts versus 8.7% for the 2025 server baseline. For text-to-speech, AFM 3 Core Advanced at its 1-billion-parameter activation size scored 4.15 on a 5-point Mean Opinion Score versus 3.87 for the current production TTS system, with a wider gap on conversational text (4.24 vs. 3.82) (Apple ML Research).
Training approach
Apple said it scaled pre-training on TPU accelerators, used supervised fine-tuning with multi-stage reinforcement learning, and applied quantization-aware training. Apple stated it does not use users' private personal data or user interactions in training (Apple ML Research). Bank of America characterised the WWDC announcements as a 'material positive reset' for Apple's AI strategy, per Seeking Alpha.
For practitioners, what to watch
Key indicators include developer uptake of new AFM 3 SDKs and APIs; independent benchmarks for AFM 3 Cloud Pro; how the NVIDIA GPU / Google Cloud PCC extension performs under real workloads; and whether Apple's forthcoming technical report (promised later in 2026) includes third-party reproducible evaluations. The split between on-device and cloud inference will affect battery life, latency, and privacy guarantees across device tiers.
Scoring Rationale
Apple's detailed technical disclosure of AFM 3 confirms a significant architectural advance: a 20B-parameter sparse on-device model, five-model hybrid inference stack, and Private Cloud Compute extension to NVIDIA GPUs in Google Cloud. This is a major platform AI release with direct implications for on-device ML practitioners and tooling vendors. Score raised from 7.6 to 7.8 after primary-source verification confirmed the architectural depth and evaluation results.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
