Open Source — MIT Licensed · v0.1.0

Cache Every Layer of Your
AI Agent Pipeline

OmniCache-AI is a framework-agnostic caching library that eliminates redundant AI operations. Cache embeddings, retrieval, context, LLM responses, and semantic similarity — cut latency and cost by up to 90%.

Get Started →★ GitHub

$pip install git+https://github.com/ashishpatel26/omnicache-ai.git

Cache Layers

Backends

Adapters

40+

Recipes

Middleware

Why OmniCache-AI?

Every feature designed to eliminate wasted AI compute and cost.

🗂️

5 Cache Layers

Response, Embedding, Retrieval, Context, and Semantic layers — each optimized for its data type and serialization format.

Explore layers →

🗄️

5 Storage Backends

In-Memory LRU, Disk, Redis, FAISS, and ChromaDB. Pick the backend that matches your scale and persistence needs.

See backends →

🧩

6 Framework Adapters

LangChain, LangGraph, AutoGen, CrewAI, Agno, and A2A. Drop-in integration with zero code changes.

View adapters →

🧠

Semantic Cache

Returns cached answers for semantically similar queries using cosine similarity — not just exact matches.

Learn more →

🏷️

Tag-Based Invalidation

Tag entries by model, session, or deployment. Invalidate thousands of related keys with a single call.

See invalidation →

⏱️

Smart TTL Policies

Configure time-to-live per cache type. Embeddings last 24h, responses 10min. Fully env-var configurable.

Configure TTL →

Works with every major AI framework

LangChainLangGraphAutoGenCrewAIAgnoA2A

Quick Example

example.py

from omnicache_ai import CacheManager, InMemoryBackend, CacheKeyBuilder

# Wire up in 3 lines
manager = CacheManager(
    backend=InMemoryBackend(),
    key_builder=CacheKeyBuilder(namespace="myapp"),
)

# Cache any value with optional TTL
manager.set("my_key", {"result": "data"}, ttl=60)
value = manager.get("my_key")  # {"result": "data"}

Before vs. After

✗ Without Caching	✓ With OmniCache-AI
Every LLM call billed at full token cost	Identical prompts returned instantly, zero tokens
Embeddings re-computed on every request	Vectors stored and reused across sessions
Vector search re-run for same queries	Retrieval results cached by query + top_k
Agent state lost between runs	Session context persisted across turns
Similar questions treated as unique	Cosine similarity returns cached answer

Get Started

📥

Installation

Install via pip, uv, or from GitHub

🚀

Quick Start

Your first cache in 30 seconds

📖

Cookbook

40+ runnable recipes for every framework

⚡

API Reference

Complete class and method documentation

Cache Every Layer of YourAI Agent Pipeline

Why OmniCache-AI?

Quick Example

Before vs. After

Get Started

Cache Every Layer of Your
AI Agent Pipeline