Skip to main content

CacheKeyBuilder

The CacheKeyBuilder generates deterministic, namespaced cache keys by hashing content into a canonical form. It guarantees that identical inputs always produce the same key, regardless of dictionary key ordering or serialization quirks.


Overview

Cache key collisions can silently corrupt data; overly verbose keys waste memory. CacheKeyBuilder solves both problems with a structured key schema:

{namespace}:{type_prefix}:{hash[:16]}

For example: omnicache:resp:a3f9b2c1d4e5f678

The builder:

  1. Maps the cache_type to a short prefix (e.g., "response" becomes "resp").
  2. Serializes the content and any extra discriminators into canonical JSON (sorted keys, ASCII-safe).
  3. Hashes the JSON with SHA-256 (default) or MD5 and truncates to 16 hex characters.
  4. Prepends the namespace and prefix.

This produces compact, collision-resistant keys that are human-readable enough for debugging.


Usage

Basic Key Generation

from omnicache_ai.core.key_builder import CacheKeyBuilder

kb = CacheKeyBuilder(namespace="myapp")

# Build a response cache key
key = kb.build("response", "What is the capital of France?")
print(key) # myapp:resp:7c2a1f3b9e4d6a80

# Build an embedding cache key
key = kb.build("embedding", "Hello world")
print(key) # myapp:embed:1a2b3c4d5e6f7890

With Extra Discriminators

Use the extra parameter to differentiate keys that share the same content but differ in context (e.g., different models or versions).

key_gpt4 = kb.build("response", "Explain gravity", extra={"model": "gpt-4"})
key_claude = kb.build("response", "Explain gravity", extra={"model": "claude-3"})

# These produce different keys because the extra dict differs
assert key_gpt4 != key_claude

With Complex Content

The builder handles any JSON-serializable content: strings, lists, dicts, numbers, and nested structures.

# Dict content (key order does not matter -- canonical JSON sorts keys)
key1 = kb.build("response", {"role": "user", "content": "Hi"})
key2 = kb.build("response", {"content": "Hi", "role": "user"})
assert key1 == key2 # Identical keys regardless of dict order

# List of messages
messages = [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "What is 2+2?"},
]
key = kb.build("response", messages, extra={"model": "gpt-4"})

Custom Namespace and Algorithm

# Use MD5 for faster hashing (when collision resistance is less critical)
kb_fast = CacheKeyBuilder(namespace="dev", algo="md5")
key = kb_fast.build("context", {"session_id": "abc123"})
print(key) # dev:ctx:9f8e7d6c5b4a3210
Choosing a hash algorithm

Use sha256 (the default) for production workloads where collision resistance matters. Use md5 in development or benchmarking scenarios where speed is prioritized over security.


Type Prefixes

The builder maps standard cache types to short prefixes. Custom types are used as-is.

Cache TypePrefixTypical Use
"embedding"embedEmbedding vector caches
"retrieval"retrievalRAG retrieval result caches
"context"ctxAgent context / session caches
"response"respLLM response caches
(custom)(as-is)Any user-defined cache type
# Standard types use short prefixes
kb.build("embedding", "text") # omnicache:embed:...
kb.build("retrieval", "query") # omnicache:retrieval:...
kb.build("context", "session") # omnicache:ctx:...
kb.build("response", "prompt") # omnicache:resp:...

# Custom types pass through unchanged
kb.build("tool_call", "data") # omnicache:tool_call:...

Key Determinism

The builder guarantees deterministic key generation through canonical JSON serialization:

  • Dictionary keys are sorted alphabetically.
  • Output is ASCII-safe (ensure_ascii=True).
  • Non-serializable objects fall back to str() via default=str.
# These all produce the same key:
kb.build("response", {"b": 2, "a": 1})
kb.build("response", {"a": 1, "b": 2})
warning

Floating-point precision can affect key determinism. If your content includes floats, consider rounding them before passing to build() to avoid subtle mismatches across platforms.


API Reference

Constructor

CacheKeyBuilder(namespace: str = "omnicache", algo: str = "sha256")
ParameterTypeDefaultDescription
namespacestr"omnicache"Global prefix applied to every key
algostr"sha256"Hash algorithm: "sha256" (secure) or "md5" (faster)

Methods

MethodSignatureReturnsDescription
buildbuild(cache_type, content, extra=None)strBuild a deterministic cache key

Method Details

build(cache_type, content, extra=None)

Build a cache key for the given cache type and content.

ParameterTypeDefaultDescription
cache_typestrrequiredOne of "embedding", "retrieval", "context", "response", or a custom string
contentAnyrequiredPrimary cache input (text, list, dict, etc.)
extradict[str, Any] | NoneNoneAdditional discriminators (e.g., model_id, index_version)

Returns: A string key like "omnicache:embed:a3f9b2c1d4e5f678".


Integration with CacheManager

The CacheKeyBuilder is typically accessed through CacheManager.key_builder:

from omnicache_ai import CacheManager, OmnicacheSettings

manager = CacheManager.from_settings(OmnicacheSettings(namespace="prod"))

# Use the manager's key builder
key = manager.key_builder.build("response", {"prompt": "Hello"}, extra={"model": "gpt-4"})
manager.set(key, "Hi there!", cache_type="response")

Next Steps

  • CacheManager -- Uses the key builder for all cache operations
  • Policies -- TTL resolution by cache type
  • Settings -- Configure namespace and hash algorithm