Researcher Runs Qwen 397B Locally Using Flash
On March 18, 2026, Dan Woods demonstrated running a custom Qwen3.5-397B-A17B MoE model at 5.5+ tokens/second on a 48GB MacBook Pro M3 Max using techniques from Apple’s 2023 "LLM in a flash" paper. The 209GB model (120GB quantized) streams 2-bit quantized experts from SSD while keeping 5.5GB of non-expert state in RAM; evaluation quality details remain thin.
Key Points
- 1Demonstrates running Qwen3.5-397B-A17B at 5.5+ tokens/sec on a 48GB MacBook Pro M3 Max
- 2Uses MoE expert streaming and Apple's 'LLM in a flash' cost model to minimize flash-to-DRAM transfers
- 3Enables large MoE inference on consumer devices, requiring 5.5GB resident memory and 2-bit expert quantization
Scoring Rationale
Practical, high-impact demonstration with reusable code, limited by single-source reporting and thin, unverified quality evaluations.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

