Viquar Khan Proposes Real-Time RAG Architecture

Viquar Khan, a Senior Architect at AWS, diagnoses production failures in agentic systems as "context rot" caused by stale retrieval stores, not model regressions. He argues embeddings are materialized views over transactional data and that traditional batch pipelines create silent drift that breaks long-running agents. Khan demonstrates a pragmatic architecture using Spark and Iceberg to implement low-latency change data capture, transactional upserts, and controlled write amplification so embedding stores reflect live state. The result: retrieval-augmented generation (RAG) systems that maintain accuracy across long sessions and continuous agent loops by keeping vector indices consistent with source-of-truth updates.
What happened
Viquar Khan, Senior Architect and GenAI specialist at AWS, describes a production outage where an agent made incorrect operational decisions because its retrieval layer had drifted from live transactional data. The failure mode, which he names "context rot," emerges when long-running agentic RAG loops rely on stale embeddings and vector stores. Khan cites empirical failure points, including performance collapse near the 32,000 token threshold and studies showing accuracy drops when facts migrate toward the middle of long contexts.
Technical details
Khan reframes embeddings as a materialized view of transactional data and shows why legacy batch ETL fails for high-frequency updates. He advocates using Spark with table formats like Iceberg to provide ACID semantics, incremental compaction, and efficient upserts that keep the embedding store current. Key implementation patterns he highlights include:
- •streaming change data capture (CDC) into a transactional table to capture inserts, updates, and deletes
- •incremental embedding recomputation for changed records and selective reindexing rather than full rebuilds
- •write amplification controls and compaction strategies to avoid exploding storage and recompute costs
These elements let you maintain a near-real-time vector index that aligns with business state while bounding cost.
Context and significance
This is an operationally focused intervention, not a new model or training technique. Its significance is practical: as agents move from stateless prompts to continuous execution, data systems become the primary failure surface. The article shifts responsibility from the LLM to the data engineering layer and makes clear that retrieval consistency, not bigger models, is the limiting factor for reliable agent behavior. Vendor vector DBs that lack transactional connectivity, or pipelines that recompute embeddings in large batches, are fragile for agentic workloads. Adopting transactional table formats and event-driven embedding refreshes bridges that gap.
What to watch
Look for tighter CDC-to-vector pipelines, native connectors between Iceberg-like formats and vector search tools, and vendor features that treat embeddings as first-class incremental materialized views. Teams building agentic systems should instrument retrieval freshness and budget for incremental recompute, compaction, and TTL policies to avoid silent degradation.
Scoring Rationale
The piece delivers a high-value operational pattern for production RAG and agentic systems, directly relevant to ML engineering. It is practical rather than research-first, so significance is notable but not paradigm-shifting. The story is older than three days, so its score is reduced for recency.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.


