Cache Layers
OmniCache-AI ships five purpose-built cache layers. Each layer targets a distinct stage of the AI pipeline, uses optimized serialization for its data type, and exposes a consistent get / set / get_or_* interface backed by a shared CacheManager.
At a Glance
| Layer | Class | What it caches | Key Components | Serialization |
|---|---|---|---|---|
| Response | ResponseCache | LLM completions (any object) | model ID + messages hash + params hash | pickle |
| Embedding | EmbeddingCache | Dense float32 vectors (np.ndarray) | text + model ID | tobytes / frombuffer |
| Retrieval | RetrievalCache | Document lists from retrievers | query + retriever ID + top_k | pickle |
| Context | ContextCache | Conversation message history | session ID + turn index | pickle |
| Semantic | SemanticCache | Any value, matched by meaning | exact key or cosine similarity | pickle |
Pipeline Diagram
The layers form a cascading pipeline. A query enters at the top and falls through each layer until a cache hit is found or the live service is called.
Each layer is independent. Use only the layers relevant to your workload. A simple chatbot may need only ResponseCache, while a full RAG agent benefits from all five.
Shared Design Principles
All cache layers follow a common contract:
- Constructor takes a
CacheManager(exceptSemanticCache, which wires its own backends directly). get(...)returnsNoneon miss -- never raises on a cache miss.set(...)accepts an optionalttl-- when omitted, theTTLPolicyon the manager decides.get_or_*convenience methods combine lookup and computation in a single call, ensuring the result is stored before it is returned.- Keys are built via
CacheKeyBuilder-- deterministic, namespaced, and hash-truncated for readability.
How Layers Relate to Backends
+-----------------+ +-----------------+
| Cache Layer | ----> | CacheManager |
| (ResponseCache,| | |
| EmbeddingCache | | backend | ---> InMemory / Disk / Redis
| etc.) | | vector_backend | ---> FAISS / Chroma
| | | key_builder |
+-----------------+ | ttl_policy |
| invalidation |
+-----------------+
Layers never talk to a backend directly. They delegate to CacheManager, which resolves TTL, builds keys, and routes to the appropriate storage backend.
SemanticCache accepts an exact_backend and a vector_backend directly, bypassing CacheManager. This gives it full control over the two-tier lookup flow.
Quick Setup
from omnicache_ai import CacheManager, InMemoryBackend, CacheKeyBuilder
from omnicache_ai.layers.response_cache import ResponseCache
from omnicache_ai.layers.embedding_cache import EmbeddingCache
from omnicache_ai.layers.retrieval_cache import RetrievalCache
from omnicache_ai.layers.context_cache import ContextCache
manager = CacheManager(
backend=InMemoryBackend(),
key_builder=CacheKeyBuilder(namespace="myapp"),
)
response_cache = ResponseCache(manager)
embedding_cache = EmbeddingCache(manager, dim=1536)
retrieval_cache = RetrievalCache(manager)
context_cache = ContextCache(manager)
For SemanticCache, see the dedicated SemanticCache page.
Next Steps
- ResponseCache -- cache LLM completions
- EmbeddingCache -- cache dense vectors
- RetrievalCache -- cache retriever results
- ContextCache -- cache conversation history
- SemanticCache -- meaning-aware caching (the core differentiator)