← Lessons

quiz vs the machine

Gold1430

Machine Learning

Embedding Caches And Vector Stores

Save computed embeddings and search them by similarity fast.

5 min read · core · beat Gold to climb

The cost of embeddings

An embedding turns text or an image into a vector that captures meaning. Computing one requires a model pass, which is expensive when you embed the same content repeatedly.

Caching embeddings

An embedding cache stores the vector for each piece of content keyed by the content. Because the same document or query embeds to the same vector, a cache hit skips the model entirely. This matters in retrieval systems that embed large unchanging corpora.

The vector store

Searching embeddings by similarity is its own problem. A vector store indexes many vectors so it can quickly find the nearest neighbors to a query vector. Brute force comparison is slow, so stores use approximate nearest neighbor indexes that trade a little accuracy for big speed.

How they work together

  • Documents are embedded once and the vectors are stored.
  • A query is embedded, often from cache if seen before.
  • The vector store returns the closest documents by similarity.

Key idea

Embedding caches avoid recomputing vectors for repeated content, while vector stores index vectors for fast approximate similarity search. Together they make retrieval over large corpora both cheap and fast.

Check yourself

Answer to earn rating on the learn ladder.

1. What does an embedding cache let you avoid?

2. Why do vector stores use approximate nearest neighbor indexes?