Ollama Boosts Mac Performance With MLX
Ollama released a preview update (Ollama 0.19) on March 31, 2026, that uses Apple's MLX framework to accelerate local AI inference on Macs with Apple silicon. The company reports about 1.6× faster prefill speeds and nearly 2× faster decode speeds, with the largest improvements on M5-series chips and smarter memory management. The preview requires more than 32GB unified memory and currently supports Alibaba's Qwen3.5.
Key Points
- 1Increases prefill speed ~1.6× and nearly doubles decode speed on Apple silicon Macs.
- 2Leverages Apple's MLX and M5 GPU Neural Accelerators to deliver the largest performance gains.
- 3Improves responsiveness for coding assistants and long sessions; preview supports Qwen3.5 and needs >32GB.
Scoring Rationale
Official product preview from Ollama offers measurable performance gains (1.6× prefill, ~2× decode) and is directly actionable for Mac users; scored high for actionability, credibility, and relevance. Scope is limited to Apple-silicon Macs and current model support is narrow, which slightly reduces novelty and breadth.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
