AutoGenCacheAdapter

Cached agent replies for AutoGen. AutoGenCacheAdapter wraps any AutoGen agent with response caching, supporting both the legacy pyautogen 0.2.x API (ConversableAgent.generate_reply) and the new autogen-agentchat 0.4+ API (AssistantAgent.run / arun).

Overview

AutoGen agents generate replies by calling LLMs, which incurs API costs and latency on every invocation. AutoGenCacheAdapter wraps an agent instance and intercepts reply generation to check the cache first. If an identical request has been seen before, the cached reply is returned instantly.

The adapter auto-detects which API generation the wrapped agent uses and exposes the appropriate cached methods. All non-overridden attributes are proxied through to the original agent via __getattr__, so the adapter is a transparent drop-in replacement.

When to use:

You are using AutoGen agents and want to avoid redundant LLM calls for repeated inputs.
You are running agent conversations in a loop or testing pipeline and want deterministic, fast responses for known inputs.
You want a unified caching solution that works across AutoGen versions.

Installation

pyautogen 0.2.x (legacy)

pip install 'omnicache-ai[autogen]'

This installs pyautogen >= 0.2.

autogen-agentchat 0.4+ (new API)

pip install omnicache-ai 'autogen-agentchat>=0.4'

The autogen-agentchat package is not yet included in the [autogen] extra; install it directly.

Usage

pyautogen 0.2.x

from autogen import ConversableAgent
from omnicache_ai import CacheManager, InMemoryBackend, CacheKeyBuilder
from omnicache_ai.adapters.autogen_adapter import AutoGenCacheAdapter

manager = CacheManager(
    backend=InMemoryBackend(),
    key_builder=CacheKeyBuilder(namespace="myapp"),
)

# Create a standard AutoGen agent
agent = ConversableAgent(
    name="assistant",
    llm_config={"model": "gpt-4o", "api_key": "..."},
)

# Wrap it with caching
cached_agent = AutoGenCacheAdapter(agent, manager)

# generate_reply is cached
messages = [{"role": "user", "content": "What is 2+2?"}]
reply = cached_agent.generate_reply(messages=messages)

# Second call returns from cache (no LLM API call)
reply = cached_agent.generate_reply(messages=messages)

autogen-agentchat 0.4+

from autogen_agentchat.agents import AssistantAgent
from omnicache_ai import CacheManager, InMemoryBackend, CacheKeyBuilder
from omnicache_ai.adapters.autogen_adapter import AutoGenCacheAdapter

manager = CacheManager(
    backend=InMemoryBackend(),
    key_builder=CacheKeyBuilder(namespace="myapp"),
)

agent = AssistantAgent("assistant", model_client=model_client)
cached_agent = AutoGenCacheAdapter(agent, manager)

# Async cached run
result = await cached_agent.arun("What is machine learning?")

# Second call returns from cache
result = await cached_agent.arun("What is machine learning?")

Sync Run (0.4+ Agents)

For 0.4+ agents that support sync execution:

result = cached_agent.run("Explain caching strategies")

note

The run() method falls back to generate_reply() for 0.2.x agents that do not have a run() method.

Transparent Proxy

The adapter forwards any attribute not explicitly overridden to the wrapped agent:

cached_agent = AutoGenCacheAdapter(agent, manager)

# These all proxy through to the original agent
print(cached_agent.name)           # "assistant"
print(cached_agent.llm_config)     # {"model": "gpt-4o", ...}

Multi-Agent Conversations

Wrap each agent individually to cache their replies independently:

assistant = ConversableAgent(name="assistant", llm_config={...})
reviewer = ConversableAgent(name="reviewer", llm_config={...})

cached_assistant = AutoGenCacheAdapter(assistant, manager)
cached_reviewer = AutoGenCacheAdapter(reviewer, manager)

With Redis Backend

from omnicache_ai.backends.redis_backend import RedisBackend

manager = CacheManager(
    backend=RedisBackend(url="redis://localhost:6379/0"),
    key_builder=CacheKeyBuilder(namespace="autogen"),
)
cached_agent = AutoGenCacheAdapter(agent, manager)

API Reference