DoorDash Builds Open Data Architecture for Agentic AI

SiliconANGLE reports that DoorDash has spent roughly 10 years building an open data architecture based on open storage, open compute, and compute-agnostic design to support real-time logistics and emerging machine-led workflows. Speaking at Snowflake Summit 2026, Jajoo, DoorDash's head of data engineering, data platform, and business intelligence, said "the machine user is outpacing the human user in consumption of analytics data." SiliconANGLE and Snowflake's Chris Child, vice president of product for data engineering, credited Apache Iceberg with reducing data movement, lowering latency, and cutting infrastructure cost across DoorDash's multi-platform estate. Separately, ByteByteGo reports DoorDash built an LLM testing system to evaluate chatbots after its customer-support assistant showed subtle hallucinations, noting the company handles hundreds of thousands of support contacts a day. Together, the two threads, an Iceberg-backed open data estate and purpose-built LLM evaluation, show how large logistics platforms combine data plumbing and testing tooling to scale agentic AI cost-effectively.
What happened
SiliconANGLE reports that DoorDash has spent about 10 years constructing an open data architecture founded on open storage, open compute, and compute-agnostic design to serve consumers, merchants, and delivery workers across real-time logistics. Speaking at Snowflake Summit 2026, Jajoo, head of data engineering, data platform, and business intelligence at DoorDash, said "the machine user is outpacing the human user in consumption of analytics data." SiliconANGLE and Chris Child, Snowflake's vice president of product for data engineering, described how DoorDash adopted Apache Iceberg across its estate to reduce data-movement costs and latency, noting the open approach is cheaper and faster because it removes repeated copy steps between systems.
Technical details
ByteByteGo reports that DoorDash built an LLM testing system after observing subtle hallucinations in its customer-support chatbot, with examples where the model misread order-status fields and recommended incorrect refund policies. ByteByteGo notes DoorDash handles hundreds of thousands of support contacts every day, which motivated investment in automated testing for model deployments.
Editorial analysis
Industry-pattern observation: at petabyte scale, organizations increasingly prefer open data formats and compute-agnostic stacks to avoid repeatedly copying data between systems. DoorDash's approach aligns with a broader enterprise move toward table formats like Apache Iceberg to reduce egress, lower latency, and let engineers focus on application logic rather than data plumbing.
Context and significance
For practitioners, the two stories together highlight two operational pillars for scaling agentic AI in production: an open, queryable data substrate that minimizes expensive hops and keeps fresh state available to models; and robust, scenario-driven testing plus cost telemetry for LLM behavior. The pillars address operational cost and model reliability in different parts of the stack.
What to watch
Track adoption of open table formats and compute-agnostic tooling across other logistics and marketplace platforms, and whether more teams pair data-platform investments with systematic LLM testing and spend monitoring to catch subtle hallucinations and cost regressions early.
Key Points
- 1Open table formats like Apache Iceberg cut costly data movement, lowering latency and infrastructure spend for petabyte-scale platforms such as DoorDash.
- 2High-volume LLM deployments need targeted testing; DoorDash built an evaluation system after subtle hallucinations appeared in its support assistant.
- 3Pairing an open data substrate with systematic LLM evaluation separates data plumbing from model-behavior monitoring, improving operational scalability for agentic AI.
Scoring Rationale
A notable enterprise infrastructure pattern: an open, Iceberg-backed data estate plus purpose-built LLM testing to scale agentic AI. It is relevant to practitioners managing petabyte-scale data and production LLMs, but it is an architecture and tooling story rather than a frontier-model or industry-changing release.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
