How to Keep a RAG System's Knowledge Up to Date
- Published on
- Arnab Mondal--6 min read
Overview
Engineers love to answer “How do we keep a RAG system up‑to‑date?” with “we’ll re‑index every hour.” It sounds simple—until traffic spikes, a newsroom breaks a story, and your assistant keeps quoting yesterday’s facts. The real challenge isn’t the schedule; it’s the quiet tradeoff between freshness, cost, and correctness. The moment you stop boiling the ocean and start watching the ripples, the design gets a lot calmer. That’s where
TL;DR
Stream changes with CDC and upsert only the modified chunks into your vector DB; skip cron-based full reindexing so answers stay fresh within seconds while costs track actual change, not time.
Why Batch Re-indexing Fails
A batch window makes accuracy a hostage. If your window is fifteen minutes, your answers are potentially fifteen minutes stale. During quiet hours, you may not notice. During breaking news, that gap becomes the entire story. Worse, re‑embedding the same documents over and over is a budget sink—your vector store explodes, your embedder runs hot, and your ops team chases invisible failures that batch jobs tend to hide. I’ve seen two designs for the same “news assistant”: the junior plan re‑embeds everything on a timer; the senior plan listens to row‑level changes and only touches what moved. One bill is predictable and small; the other is a bonfire.
In the junior approach, the room felt calm right up until it didn’t. A timer fired every fifteen minutes like clockwork, kicking off a re‑index job that marched through tens of thousands of articles. For fourteen minutes and fifty‑nine seconds, the assistant was living in the past. Users asked about a developing policy change; the model pulled in the previous draft of the article because the fresh one hadn’t made the next window yet. Meanwhile, the embedding service surged to 100% CPU, autoscaling kicked in, costs spiked, and the on‑call engineer watched a wall of green checkmarks masking a simple truth: the job “succeeded,” but the answers were wrong when it mattered.
The senior approach looked boring on paper and brilliant in practice. Instead of a drumbeat, there was a stream. A reporter updated a paragraph; the database emitted a change event with the article ID and the op type. A tiny worker picked up that event, fetched the latest text, split only the affected chunks, generated new embeddings, and upserted them by deterministic keys. The retriever saw the new vectors within seconds. There were no heroics, no big batches, no mystery failures hiding in a haystack. When the newsroom heated up, the stream just got a little busier, and then it cooled back down. The bill tracked reality: you paid for change, not for ritual.
That’s the heart of it. Timers create a false sense of control; streams create real control. With streams, correctness arrives when the data does. With timers, correctness arrives when the next cron line says it’s allowed to.
Stream Architecture
The shift is subtle: think in streams, not batches. Place a CDC stream—Debezium, native logical replication, or a managed connector—on your source of truth. Every insert, update, or delete becomes a tiny message. A small ingestion worker reads those events, loads the canonical text, and generates embeddings only for the changed chunks before upserting into the vector database. The result lands in seconds, not minutes, and you only pay for real change.
High-level flow
Minimal reference components
In practice, this architecture rests on three simple pieces. First, the CDC producer captures row‑level changes from the tables that matter—articles
, docs
, or whatever your domain calls them. Second, an ingestion worker, built to be small and idempotent, reads events, fetches the source record, chunks the content, and generates embeddings only for what actually changed before upserting or deleting in the vector store. Third, the retriever leans on metadata filters—tenant, language, recency—and hybrid search where available, so you return the right context without over‑fetching.
There are a few operational niceties that keep this smooth. Carry the operation type on each event—create, update, delete—and include a monotonic timestamp or version so late arrivals don’t clobber newer writes. Upserts should target a stable composite key, often a document_id
with a chunk_id
, so a single edit doesn’t force you to rewrite the entire document’s vector footprint.
Event schema
ChangeEvent
CDC change envelope
Class Definition
Structure Overview
Properties (7)
Name | Type | Optional | Comment |
---|---|---|---|
id | string | No | - |
op | string | No | c|u|d |
table | string | No | - |
document_id | string | No | - |
ts_ms | number | No | - |
before | JSON | No | - |
after | JSON | No | - |
Pseudocode: Ingestion Worker
type ChangeEvent = {
id: string;
op: 'c' | 'u' | 'd';
table: string;
document_id: string;
ts_ms: number;
};
async function handle(event: ChangeEvent) {
if (event.table !== 'articles') return;
if (event.op === 'd') {
await vectorDb.deleteByDocument(event.document_id);
return;
}
const doc = await db.getArticle(event.document_id);
if (!doc) return;
const chunks = chunker(doc.content, { maxTokens: 800, overlap: 100 });
const vectors = await embedder.embed(chunks.map((c) => c.text));
await vectorDb.upsert(
vectors.map((v, i) => ({
id: `${event.document_id}::${i}`,
document_id: event.document_id,
text: chunks[i].text,
metadata: {
updatedAt: doc.updatedAt,
source: 'articles',
url: doc.url,
},
vector: v,
}))
);
}
Cost & Takeaways
If the business cannot tolerate a stale fact for even a minute, a batch window won’t save you. Streams will. And if cost matters—and it always does—stop paying to re‑describe the same text. Pay only for change. CDC gives you that lever: a system that reacts in seconds, stays accurate, and costs what it should.
Conclusion
The right question isn’t “how often should we sync?” It’s “what’s the cost of a single stale fact?” CDC lets you answer with a system that reacts in seconds, stays accurate, and costs far less than batch re‑indexing.
Available for hire - If you're looking for a skilled full-stack developer with AI integration experience, feel free to reach out at hire@codewarnab.in