Skip to main content
Open Source — MIT Licensed  ·  v0.1.0

Cache Every Layer of Your
AI Agent Pipeline

OmniCache-AI is a framework-agnostic caching library that eliminates redundant AI operations. Cache embeddings, retrieval, context, LLM responses, and semantic similarity — cut latency and cost by up to 90%.

Get Started →★ GitHub
$pip install git+https://github.com/ashishpatel26/omnicache-ai.git
5
Cache Layers
5
Backends
6
Adapters
40+
Recipes
3
Middleware

Why OmniCache-AI?

Every feature designed to eliminate wasted AI compute and cost.

🗂️
5 Cache Layers

Response, Embedding, Retrieval, Context, and Semantic layers — each optimized for its data type and serialization format.

Explore layers
🗄️
5 Storage Backends

In-Memory LRU, Disk, Redis, FAISS, and ChromaDB. Pick the backend that matches your scale and persistence needs.

See backends
🧩
6 Framework Adapters

LangChain, LangGraph, AutoGen, CrewAI, Agno, and A2A. Drop-in integration with zero code changes.

View adapters
🧠
Semantic Cache

Returns cached answers for semantically similar queries using cosine similarity — not just exact matches.

Learn more
🏷️
Tag-Based Invalidation

Tag entries by model, session, or deployment. Invalidate thousands of related keys with a single call.

See invalidation
⏱️
Smart TTL Policies

Configure time-to-live per cache type. Embeddings last 24h, responses 10min. Fully env-var configurable.

Configure TTL

Works with every major AI framework

LangChainLangGraphAutoGenCrewAIAgnoA2A

Quick Example

example.py
from omnicache_ai import CacheManager, InMemoryBackend, CacheKeyBuilder
# Wire up in 3 lines
manager = CacheManager(
backend=InMemoryBackend(),
key_builder=CacheKeyBuilder(namespace="myapp"),
)
# Cache any value with optional TTL
manager.set("my_key", {"result": "data"}, ttl=60)
value = manager.get("my_key") # {"result": "data"}

Before vs. After

✗ Without Caching✓ With OmniCache-AI
Every LLM call billed at full token costIdentical prompts returned instantly, zero tokens
Embeddings re-computed on every requestVectors stored and reused across sessions
Vector search re-run for same queriesRetrieval results cached by query + top_k
Agent state lost between runsSession context persisted across turns
Similar questions treated as uniqueCosine similarity returns cached answer

Get Started

📥
Installation

Install via pip, uv, or from GitHub

🚀
Quick Start

Your first cache in 30 seconds

📖
Cookbook

40+ runnable recipes for every framework

API Reference

Complete class and method documentation