OmniCache-AI is a framework-agnostic caching library that eliminates redundant AI operations. Cache embeddings, retrieval, context, LLM responses, and semantic similarity — cut latency and cost by up to 90%.
Every feature designed to eliminate wasted AI compute and cost.
Response, Embedding, Retrieval, Context, and Semantic layers — each optimized for its data type and serialization format.
In-Memory LRU, Disk, Redis, FAISS, and ChromaDB. Pick the backend that matches your scale and persistence needs.
LangChain, LangGraph, AutoGen, CrewAI, Agno, and A2A. Drop-in integration with zero code changes.
Returns cached answers for semantically similar queries using cosine similarity — not just exact matches.
Tag entries by model, session, or deployment. Invalidate thousands of related keys with a single call.
Configure time-to-live per cache type. Embeddings last 24h, responses 10min. Fully env-var configurable.
Works with every major AI framework
from omnicache_ai import CacheManager, InMemoryBackend, CacheKeyBuilder # Wire up in 3 linesmanager = CacheManager( backend=InMemoryBackend(), key_builder=CacheKeyBuilder(namespace="myapp"),) # Cache any value with optional TTLmanager.set("my_key", {"result": "data"}, ttl=60)value = manager.get("my_key") # {"result": "data"}
Install via pip, uv, or from GitHub
Your first cache in 30 seconds
40+ runnable recipes for every framework
Complete class and method documentation