Models & Researchfoundation modelsapplegoogleon device ml

Apple Debuts Third-Generation Foundation Models and AFM Core Advanced

Name: LDS Mentor
Availability: InStock

|June 9, 2026|By LDS Team

8.1

Relevance Score

Apple Debuts Third-Generation Foundation Models and AFM Core Advanced — Photo: 9to5mac.com · rights & takedowns

Apple introduced the third generation of Apple Foundation Models (AFM), a family of five models spanning on-device and server deployments, in a June 8, 2026 post on its machine learning research site. The set includes two on-device models, AFM 3 Core and AFM 3 Core Advanced, and three server models that run on Private Cloud Compute: AFM 3 Cloud, ADM 3 Cloud (an image model), and AFM 3 Cloud Pro. Apple describes AFM 3 Core Advanced as a 20-billion-parameter, natively multimodal on-device model that uses a sparse architecture, activating only 1 to 4 billion parameters per request so it can run on Apple silicon. Apple worked with Google and NVIDIA to extend Private Cloud Compute for AFM 3 Cloud Pro to NVIDIA GPUs in Google Cloud while, Apple says, preserving its privacy guarantees. A January 12, 2026 joint statement from Apple and Google framed the next-generation AFM family as built with Google and its Gemini technology, though Apple's June 8 post emphasizes its own architecture and Apple silicon optimization.

What happened

Apple announced the third generation of Apple Foundation Models (AFM) in a June 8, 2026 post on its machine learning research site, describing a family of five models that run across devices and Apple's Private Cloud Compute. The family includes two on-device models, AFM 3 Core (the successor to Apple's roughly 3-billion-parameter dense model) and AFM 3 Core Advanced, plus three server models: AFM 3 Cloud, ADM 3 Cloud (a dedicated image model for creation, editing, and Genmoji), and AFM 3 Cloud Pro. Apple says AFM 3 Core Advanced is its most powerful on-device model, a 20-billion-parameter, natively multimodal system that uses a sparse architecture to activate only 1 to 4 billion parameters at a time depending on the request.

Technical details

Apple frames the sparse design as how it fits a 20-billion-parameter model onto consumer hardware. The technique, which Apple describes as Instruction-Following Pruning (IFP), keeps the full parameter set in flash (NAND) storage rather than in active DRAM. Because NAND-to-DRAM bandwidth is too slow to swap weights token by token, AFM 3 Core Advanced makes routing decisions per prompt: a lightweight dense block selects a fixed subset of parameters during initial processing, so only 1 to 4 billion parameters enter active memory for inference. AFM 3 Core, AFM 3 Core Advanced, AFM 3 Cloud, and ADM 3 Cloud are optimized for Apple silicon. AFM 3 Core Advanced requires A19 Pro (iPhone 17 Pro) or M3/M4 silicon and does not support devices with 8 GB of RAM. AFM 3 Cloud Pro, positioned for the most demanding agentic tool use and complex reasoning, is optimized for NVIDIA GPUs.

The Google and NVIDIA partnership

Apple says it worked with Google and NVIDIA to extend Private Cloud Compute so AFM 3 Cloud Pro can run on NVIDIA GPUs in Google Cloud while preserving the same privacy guarantees Apple describes for on-device and Apple-silicon server inference, namely that user data is not stored or shared, including with Apple. A January 12, 2026 joint statement from Apple and Google characterized the next-generation AFM family as built in collaboration with Google and based on its Gemini technology and cloud infrastructure. Apple's June 8 technical post emphasizes its own model architecture and Apple-silicon optimization, and some independent reporting describes the on-device models as distilled from Gemini rather than running Gemini directly.

Why it matters

For practitioners, the release illustrates two converging trends. First, sparse activation with flash-resident weights is becoming a practical tool for pushing larger, multimodal models onto constrained consumer silicon: IFP's approach of storing all parameters in flash and routing a subset into DRAM per prompt is a concrete example of the memory-budget tradeoffs the field is navigating. Second, even a vendor with deep in-house silicon and model capability is leaning on external frontier-model and cloud partners for its most demanding server workloads, a hybrid device-plus-cloud pattern that blends local inference with privacy-scoped cloud compute.

What to watch

Open questions include developer API access for on-device versus server calls, benchmarks comparing AFM 3 Core Advanced against dense and other sparse on-device models across Apple silicon generations, how the NVIDIA-GPU-in-Google-Cloud path performs and scales under Private Cloud Compute, and the real memory and latency tradeoffs for multimodal workloads that will determine how widely AFM 3 Core Advanced can be deployed.

Key Points

1Apple unveiled a five-model AFM 3 family: two on-device models plus three Private Cloud Compute server models, per Apple's June 8 research post.
2AFM 3 Core Advanced packs 20 billion parameters but activates only 1 to 4 billion per request, enabling multimodal AI within Apple silicon limits.
3The most powerful server model runs on NVIDIA GPUs in Google Cloud, showing deepening device-vendor reliance on external model and infrastructure partners.

Scoring Rationale

Verified: Apple's third-generation AFM family is a flagship release spanning a novel 20-billion-parameter sparse on-device model and Private Cloud Compute server models, with the most capable server model running on NVIDIA GPUs in Google Cloud. A major, deployment-defining model release for a billion-device ecosystem and highly relevant to on-device and hybrid-inference practitioners, though scoped to Apple's own platform rather than a field-wide frontier shift.

MoreGoogle AI news

Sources

Public references used for this report.

6 sources

machinelearning.apple.comIntroducing the Third Generation of Apple's Foundation Models

cnbc.comApple partnering with Google and Nvidia for most advanced AI model

9to5mac.comApple's third-generation Foundation Models explained

View 3 more sources

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Active Search Campaigns by BudgetEasy

High CPC Clicks & Poor Landing PagesMedium

Campaign ROAS by Attribution ModelHard

250 free problems · No credit card

See all Ad Tech problems