Skip to main content

Cache Layers

OmniCache-AI ships eight purpose-built cache layers. Each targets a distinct stage of the AI pipeline with optimized serialization and a consistent get / set / get_or_* interface.

At a Glance

Layer	Class	What it caches	Key
Response	`ResponseCache`	LLM completions	model + messages + params
Streaming	`StreamingResponseCache`	Streaming LLM chunks (buffered)	model + messages + params
Embedding	`EmbeddingCache`	`np.ndarray` vectors	text + model
Retrieval	`RetrievalCache`	Document lists	query + retriever + top_k
Context	`ContextCache`	Conversation history	session ID + turn index
Semantic	`SemanticCache`	Any value by meaning (exact + cosine)	exact key or vector similarity
Adaptive Semantic	`AdaptiveSemanticCache`	Semantic cache with auto-tuning threshold	same as SemanticCache
Prompt Cache	`PromptCacheLayer`	Provider cache_control injection + savings	— (wraps API calls)

Pipeline Diagram

Shared Design Principles

Constructor takes a CacheManager — except SemanticCache/AdaptiveSemanticCache which wire their own backends.
get(...) returns None on miss — never raises.
set(...) accepts optional ttl — falls back to TTLPolicy when omitted.
get_or_* convenience methods — compute + cache in one call.
Pluggable serializer — all layers accept serializer= param.

Next Steps

ResponseCache — cache LLM completions
StreamingResponseCache — cache streaming LLM output
SemanticCache — meaning-aware caching
AdaptiveSemanticCache — auto-tuning threshold
PromptCacheLayer — provider prompt cache integration
EmbeddingCache — cache dense vectors
RetrievalCache — cache retriever results
ContextCache — cache conversation history

At a Glance
Pipeline Diagram
Shared Design Principles
Next Steps