Researchmoeqwen 3.5model quantizationflash storage

Researcher Runs Qwen 397B Locally Using Flash

|March 19, 2026|By LDS Team

7.9

Relevance Score

Researcher Runs Qwen 397B Locally Using Flash

On March 18, 2026, Dan Woods demonstrated running a custom Qwen3.5-397B-A17B MoE model at 5.5+ tokens/second on a 48GB MacBook Pro M3 Max using techniques from Apple’s 2023 "LLM in a flash" paper. The 209GB model (120GB quantized) streams 2-bit quantized experts from SSD while keeping 5.5GB of non-expert state in RAM; evaluation quality details remain thin.

Key Points

1Demonstrates running Qwen3.5-397B-A17B at 5.5+ tokens/sec on a 48GB MacBook Pro M3 Max
2Uses MoE expert streaming and Apple's 'LLM in a flash' cost model to minimize flash-to-DRAM transfers
3Enables large MoE inference on consumer devices, requiring 5.5GB resident memory and 2-bit expert quantization

Scoring Rationale

Practical, high-impact demonstration with reusable code, limited by single-source reporting and thin, unverified quality evaluations.

Sources

Public references used for this report.

1 source

01simonwillison.netAutoresearching Apple's "LLM in a Flash" to run Qwen 397B locally

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Researcher Runs Qwen 397B Locally Using Flash

Key Points

Scoring Rationale

Sources

More AI & Data Science News

DigitalOcean Adds NVIDIA RTX Ada GPU Droplets

PagerDuty Chair Highlights AI Agent Failure Risks

Researchers Demonstrate Chain-of-Thought Spoofing Against LLM Reasoners

Hugging Face And Cerebras Launch Open Speech-To-Speech AI Pipeline

Researcher Runs Qwen 397B Locally Using Flash

Key Points

Scoring Rationale

Sources

More AI & Data Science News

DigitalOcean Adds NVIDIA RTX Ada GPU Droplets

PagerDuty Chair Highlights AI Agent Failure Risks

Researchers Demonstrate Chain-of-Thought Spoofing Against LLM Reasoners

Hugging Face And Cerebras Launch Open Speech-To-Speech AI Pipeline