The cost of embeddings
An embedding turns text or an image into a vector that captures meaning. Computing one requires a model pass, which is expensive when you embed the same content repeatedly.
Caching embeddings
An embedding cache stores the vector for each piece of content keyed by the content. Because the same document or query embeds to the same vector, a cache hit skips the model entirely. This matters in retrieval systems that embed large unchanging corpora.
The vector store
Searching embeddings by similarity is its own problem. A vector store indexes many vectors so it can quickly find the nearest neighbors to a query vector. Brute force comparison is slow, so stores use approximate nearest neighbor indexes that trade a little accuracy for big speed.
How they work together
- Documents are embedded once and the vectors are stored.
- A query is embedded, often from cache if seen before.
- The vector store returns the closest documents by similarity.
Key idea
Embedding caches avoid recomputing vectors for repeated content, while vector stores index vectors for fast approximate similarity search. Together they make retrieval over large corpora both cheap and fast.