Tapas — Spanish for “covers” & small bites. Here, bite-size flavors of AI energy and data, served fast.
Tapas reduces AI energy consumption by up to 99% per query through intelligent question categorization and semantic caching. Use it standalone or plug it into any existing AI platform.
Three layers of intelligence that eliminate unnecessary AI compute
Every query is embedded and matched against 160+ knowledge categories using cosine similarity. No full LLM call needed.
See categoriesIf a semantically similar question exists in the cache, the pre-validated answer is returned instantly — bypassing inference entirely.
View cache dashboardUsers opt into LEM to receive concise bullet-point answers from cache. Each session shows real-time energy savings in watt-hours.
Try LEM nowA complete energy-efficiency layer for AI — standalone or embedded
One toggle. Instant bullet-point answers from cache. Up to 99% less energy per query.
Cosine similarity matching across 160+ categories. No keyword hacks — real semantic understanding.
Live charts showing Wh saved, cache hit rates, CO₂ equivalent, and 30-day trends.
160+ pre-built categories across Science, Health, Finance, History, Programming, Environment, and General Knowledge.
REST API + JavaScript/Python SDKs. Plug Tapas into ChatGPT, Claude, Gemini, or any custom AI stack.
Role-based admin panel to validate, edit, and manage cached responses. Trigger cache warming on demand.
Pre-populate cache with answers to trending queries during off-peak hours — zero user-facing latency impact.
Works with Google Cloud, NVIDIA, AWS, Azure, or on-premise. No vendor lock-in.
Every session shows real carbon savings. At 85% cache hit rate, Tapas saves ~360 tonnes CO₂/year per 1M daily queries.
Organized across 7 domains. Every question finds its home.
Click a domain to explore its questions in the chat · Hover for sample questions
Tapas is not a replacement for your AI — it is an energy-efficiency layer that sits in front of it. One endpoint. Zero refactoring. Immediate savings.
Drop-in proxy for the OpenAI Chat Completions API. Replace your base URL — all existing code works unchanged.
Native integration — Tapas is built on Anthropic Claude. Inject the Tapas system prompt prefix to enable semantic caching.
Deploy as a Google Cloud Run sidecar alongside Vertex AI. NVIDIA NIM on GCP is US-only.
Azure API Management policy injection. Works with Copilot Studio connectors and Azure OpenAI Service.
Docker sidecar in front of Ollama or vLLM. Cache hits skip the GPU entirely — direct energy savings.
One REST endpoint. Works with any HTTP-capable AI platform — custom models, RAG pipelines, fine-tuned LLMs, or proprietary services.
One endpoint, zero infrastructure, immediate energy savings.
Any language · Any platform
Call the Tapas API from any language or platform. Returns cached or AI-generated answers with full energy metrics in every response.
const res = await fetch(
'https://tapas.one/api/trpc/query.ask',
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
json: {
query: 'How does quantum computing work?',
lemMode: true
}
})
}
);
const { result } = await res.json();
console.log(result.data.answer);No infra to manage
No vector database to manage. No embedding model to deploy. Tapas handles everything server-side — you just send queries.
Calculate your impact
First cache hit saves ~3 Wh. Drag the slider to see your organisation's projected annual energy savings.
An interactive overview of Tapas's capabilities, architecture, API surface, and roadmap. Click any section to expand.
Tapas is an AI energy efficiency middleware layer that sits between your application and any LLM backend. It reduces energy consumption by up to 99.97% per query through two core mechanisms:
Semantic Cache (LEM)
Stores and retrieves answers for semantically similar questions using cosine similarity. Cache hits return in ~40 ms at 0.001 Wh.
Smart Router
Classifies each query by domain and confidence. Routes to cache if similarity ≥ 0.72, falls through to LLM inference otherwise.
Platform Middleware
Drop-in layer for ChatGPT, Claude, Gemini, Copilot, Llama, and any OpenAI-compatible API. No code changes required.
Energy Analytics
Real-time dashboard tracking Wh saved, CO₂ avoided, cache hit rate, and per-query energy cost across all integrations.