Platform Scope Document

Tapas works with any AI platform

Tapas is a universal energy-efficiency middleware layer. It sits between your users and any AI engine, serving cached answers when available and forwarding to the AI only when necessary.

How Tapas fits into your stack

User Query
Tapas Layer
→ Cache Hit
/ Miss →
Your AI Platform

Cache hits return in <50ms at 0.001 Wh. Cache misses are forwarded to your AI with zero added latency.

Filter by difficulty:
ChatGPT
OpenAIEasy integration

Wrap Tapas around the OpenAI API as a middleware layer

Regional availability note
Up to 99% per cached query

Tapas intercepts every request to the OpenAI Chat Completions API. Before forwarding to GPT-4/GPT-3.5, it checks the semantic cache. On a hit, it returns a bullet-point answer in <50ms — no OpenAI token cost, no GPU spin-up.

Integration method:API Middleware / Proxy
Drop-in proxy — zero code changes
Works with GPT-3.5, GPT-4, GPT-4o
Streaming responses supported
Token cost reduction on cache hits
Claude
AnthropicEasy integration

Tapas is built on Claude — native Anthropic integration

Regional availability note
Up to 99% on cache hits; 0% overhead on misses

Tapas uses Anthropic Claude as its primary inference engine for cache misses. For Claude-native apps, Tapas acts as a semantic pre-filter: it checks the cache before invoking the Messages API, and stores the response for future reuse. This is the tightest integration available.

Integration method:Native / System Prompt Injection
Native Claude integration — Tapas runs on Claude
System prompt injection — no SDK changes
Automatic bullet conversion in LEM
Full Messages API compatibility
Gemini
GoogleModerate integration

Deploy Tapas on Google Cloud Run alongside Vertex AI

Regional availability note
Up to 99% per cached query + reduced Vertex AI costs

For Google Cloud deployments, Tapas runs as a Cloud Run service in the same VPC as your Vertex AI endpoints. It intercepts Gemini API calls via a sidecar proxy, checks the semantic cache (stored in Cloud Spanner or Firestore), and only forwards to Gemini on cache misses.

Integration method:Google Cloud Run Sidecar / Vertex AI Proxy
Cloud Run sidecar deployment
Vertex AI + Gemini Developer API support
Cloud Spanner cache for multi-region
NVIDIA NIM integration (US regions)
Copilot
MicrosoftModerate integration

Integrate Tapas with Azure OpenAI Service and Copilot Studio

Regional availability note
Up to 99% per cached query + reduced AOAI token spend

Microsoft Copilot and Azure OpenAI Service use the same OpenAI-compatible API. Tapas deploys as an Azure Container App or API Management policy, sitting between your Copilot Studio bot and the Azure OpenAI endpoint. Cache hits bypass the AOAI deployment entirely.

Integration method:Azure API Management Policy / Container App
Azure API Management policy injection
Copilot Studio connector support
Azure Container Apps deployment
GPT-4o, o1, and legacy model support
Llama
Meta / Self-hostedAdvanced integration

Add Tapas as a caching layer in front of your Ollama or vLLM server

Regional availability note
Direct GPU energy savings — cache hits consume ~0 GPU watts

For self-hosted Llama deployments (via Ollama, vLLM, LM Studio, or NVIDIA NIM), Tapas runs as a local reverse proxy on the same machine or in the same Docker network. It intercepts requests to port 11434 (Ollama) or your vLLM endpoint and serves cache hits without touching the GPU.

Integration method:Local Reverse Proxy / Docker Sidecar
Works with Ollama, vLLM, LM Studio, NVIDIA NIM
GPU usage reduction on cache hits
Docker Compose sidecar pattern
Llama 3.1, 3.2, 3.3 and future models
Any AI
Custom / GenericEasy integration

One REST endpoint. Works with any LLM or AI service.

Proportional to cache hit rate — every hit = 99% energy saved

Tapas exposes a single REST endpoint that accepts a query string and returns a structured response. Any AI platform that accepts HTTP calls can route queries through Tapas first. If the cache returns a hit, the AI platform never needs to be called. This works with custom models, fine-tuned LLMs, RAG pipelines, or any proprietary AI service.

Integration method:Universal REST API
Single REST endpoint — no SDK required
Works with any HTTP-capable AI platform
Language-agnostic (Python, JS, Go, Ruby, etc.)
Cache write-back API for custom models

Ready to integrate?

Start with the Tapas chat interface to see LEM in action, then explore the full REST API and SDK documentation.