WMB-100K Introduces Enterprise Memory Benchmark For Situational Retrieval Accuracy
WMB-100K v2.1, published April 1, 2026, introduces an enterprise-scale situational memory benchmark that stores 4.3 million tokens (2.3M documents and 105,591 conversation turns) and poses 2,708 situational questions, including 400 false-memory probes. It evaluates retrieval accuracy and false-positive defense using Quick (GPT-4o-mini) and Official (GPT-4o-mini, Claude Haiku, Gemini Flash majority) judging, and applies latency penalties to mirror production constraints.
Scoring Rationale
Published today and highly actionable, the release standardizes situational memory evaluation with fixed semantic judges and production-oriented latency penalties. Score reflects strong relevance and usability but is moderated because v2.1 is an incremental benchmark update and current public results/leaderboard data are not yet available.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.
Sources
- Read OriginalGitHub - Irina1920/WMB-100K: WMB-100K — The first 100,000-turn benchmark for AI memory systemsgithub.com



