Funding & Businessmemory systemscost optimizationengramfunding

Engram raises $98 million to cut token costs

|June 23, 2026|By LDS Team

7.2

Relevance Score

Engram raises $98 million to cut token costs — Photo: CNBC · rights & takedowns

Engram's $98 million round is best read as a bet that AI memory is splitting into two competing architectures: retrieval (RAG, vector databases) versus trained memory, where an organization's documents are compressed into a model's weights offline and reused cheaply at inference instead of being re-fed on every query. For practitioners evaluating token-cost fixes, the approach needs white box access to model weights, co-founder Dan Biderman said on Sequoia Capital's Training Data podcast, so teams locked into closed frontier-model APIs need a direct vendor partnership, like Engram's arrangement with Microsoft, rather than a drop-in fix. Kleiner Perkins led the round, with General Catalyst, Sequoia Capital and Andrej Karpathy among the backers, valuing the eight-month-old, 13-person startup at a reported $600 million largely on its founders' published research. The claimed up to 100 times fewer tokens has no independent benchmark yet, and Engram is not the only well-funded entrant chasing the same problem.

The real signal

Three of the industry's most prominent venture firms backing a 13-person, eight-month-old startup at a reported $600 million valuation is less interesting than what they are betting on: that AI memory is splitting into two competing architectures. One is retrieval, where a system fetches relevant documents at query time and stuffs them into the context window, the basis of RAG and vector databases. The other, Engram's bet, is trained memory, where an organization's documents are compressed offline into the model's own weights, so it answers from what it has learned instead of re-reading source material on every call. For AI and ML teams evaluating token-cost fixes, this round is a signal that trained memory now has serious capital and research pedigree behind it, not just a product pitch.

What Engram actually does

Engram, founded in October 2025 by a team from Stanford's Hazy Research lab (CEO Dan Biderman, CTO Sabri Eyuboglu, plus co-founders Jessy Lin, Jack Morris, Scott Linderman and Chris Re), emerged from stealth on June 23 with $98 million in funding, according to the company's own announcement. Kleiner Perkins led the round, per its own investment post, with General Catalyst, Sequoia Capital, Amplify Partners and other investors also participating, alongside angel investors including Andrej Karpathy. Kleiner Perkins partner Leigh Marie Braswell describes today's models as "brilliant strangers" that "reread the same documents and relearn the same context" on every query. Engram's fix, descended from Eyuboglu's Cartridges research and Lin's sparse memory fine-tuning work, trains a small, reusable memory object offline on a customer's documents instead of reloading a retrieval cache at inference time. The result, Engram says, is models that match or beat frontier-model quality while using as little as 1 to 10 percent of the tokens, or up to 100 times fewer. Early testing partners include Microsoft, which is trialing the models inside Microsoft 365 and has committed Azure GPU capacity to the work, plus Notion and legal AI startup Harvey.

The catch for practitioners

Baking knowledge into weights is not a drop-in swap for RAG. Biderman explained on Sequoia Capital's Training Data podcast that the approach requires white box access to model weights, so it works most easily on open-weight models; applying it to closed frontier models means partnering directly with the model provider, which is effectively the arrangement Engram has with Microsoft. That is a materially different integration path than adding a vector database. Trained memory also carries a staleness problem retrieval does not have: when source documents change, the compressed memory has to be retrained, while a vector index just gets re-indexed. And the up-to-100x figure so far comes only from Engram; independent analysis site The Deep Feed points out it has no third-party benchmark yet, so it is worth treating as a vendor claim pending outside verification rather than a settled number.

What to watch

The competitive field is moving fast. The Deep Feed reports that rival Supermemory open-sourced its own memory engine about a week before Engram's announcement, a reminder that memory as a concept is commoditizing even as Engram bets its pipeline for turning a corpus into reliable memory, not the idea of memory itself, is the defensible part. Worth tracking over the next few quarters: whether Microsoft, Notion and Harvey convert their pilots into disclosed production savings, whether an independent party benchmarks the token-reduction claim, and whether frontier labs respond by building comparable memory features directly into their own model APIs.

Key Points

1Engram raised $98 million (led by Kleiner Perkins, with General Catalyst, Sequoia and Andrej Karpathy) to build trained, not retrieved, AI memory.
2Rising per-query token costs from context stuffing and RAG retrieval are pushing enterprises to fund architectures that bake knowledge into model weights.
3Practitioners should note the approach needs white-box weight access and lacks independent benchmarks, so treat the 100x token-reduction claim as unverified.

Scoring Rationale

A $98 million Series A led by Kleiner Perkins, with General Catalyst, Sequoia Capital and Andrej Karpathy also investing, is now corroborated by primary posts from all three lead firms plus Engram's own announcement, resolving the single-source risk flagged in the prior score. The story is genuinely notable for AI/ML practitioners tracking inference-cost reduction: a concrete Azure GPU capacity commitment from Microsoft and named pilots with Notion and Harvey give it more substance than a typical early-stage pitch, and the technical lineage (Cartridges, sparse memory fine-tuning) is grounded in published research rather than marketing alone. It stays in the notable band rather than major because the core 100x token-reduction claim has no independent benchmark, customer deployments are still pilots rather than disclosed production results, and the memory-layer category is already contested (a rival open-sourced a competing engine the same week).

MoreAI Funding news

Sources

Primary source and supporting public references used for this report.

6 sources

Primary sourcecnbc.comAI memory startup focused on cutting token costs raises $98 million

View 5 more sources

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems