AI Energy Efficiency Layer — Works with any AI platform

Tapas — Spanish for “covers” & small bites. Here, bite-size flavors of AI energy and data, served fast.

AI that thinks before
it computes

Tapas reduces AI energy consumption by up to 99% per query through intelligent question categorization and semantic caching. Use it standalone or plug it into any existing AI platform.

16
Queries Served
43.8%
Cache Hit Rate
20.99
Wh Saved
8.1
CO₂ Saved (g)

Works as an add-on for any AI platform

Hover each platform for integration details · Click to view the full scope document

How Tapas works

Three layers of intelligence that eliminate unnecessary AI compute

01

Semantic Categorization

Every query is embedded and matched against 160+ knowledge categories using cosine similarity. No full LLM call needed.

See categories
02

Intelligent Cache Lookup

If a semantically similar question exists in the cache, the pre-validated answer is returned instantly — bypassing inference entirely.

View cache dashboard
03

Low Energy Mode

Users opt into LEM to receive concise bullet-point answers from cache. Each session shows real-time energy savings in watt-hours.

Try LEM now

160+ Knowledge Categories

Organized across 7 domains. Every question finds its home.

Click a domain to explore its questions in the chat · Hover for sample questions

The energy math is undeniable

Energy per full LLM query
~3.0 Wh
10× a Google search
Energy per Tapas cache hit
~0.001 Wh
99.97% less
At 85% cache hit rate
85% reduction
vs. full inference baseline
CO₂ saved (1M queries/day)
~360 tonnes/year
≈ 78 cars off the road
99%
energy reduction per cached query
How Tapas plugs in

Built for developers, designed for everyone

One endpoint, zero infrastructure, immediate energy savings.

REST API

Any language · Any platform

Call the Tapas API from any language or platform. Returns cached or AI-generated answers with full energy metrics in every response.

const res = await fetch(
  'https://tapas.one/api/trpc/query.ask',
  {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      json: {
        query: 'How does quantum computing work?',
        lemMode: true
      }
    })
  }
);
const { result } = await res.json();
console.log(result.data.answer);

Zero Config

No infra to manage

No vector database to manage. No embedding model to deploy. Tapas handles everything server-side — you just send queries.

01
Your app sends a query
POST /api/trpc/query.ask
02
Tapas classifies & checks cache
Cosine similarity · 160+ categories
03
Cache hit → instant bullet answer
0.001 Wh · <50ms · no GPU
04
Cache miss → Claude inference
3.0 Wh · answer stored for next time

Instant Savings

Calculate your impact

First cache hit saves ~3 Wh. Drag the slider to see your organisation's projected annual energy savings.

Queries per day1,000,000
1K1M10M
MWh saved / year930.4 MWh
CO₂ avoided / year216.8 tonnes
Cars off the road≈ 47 cars
At 1M queries/day with 85% hit rate
164 MWh/year saved
≈ 78 cars off the road · 38 tonnes CO₂

Start saving energy today

Use Tapas as your AI assistant or integrate it into your existing platform in minutes.

Product Scope

What Tapas does — and doesn't do

An interactive overview of Tapas's capabilities, architecture, API surface, and roadmap. Click any section to expand.

Tapas is an AI energy efficiency middleware layer that sits between your application and any LLM backend. It reduces energy consumption by up to 99.97% per query through two core mechanisms:

Semantic Cache (LEM)

Stores and retrieves answers for semantically similar questions using cosine similarity. Cache hits return in ~40 ms at 0.001 Wh.

Smart Router

Classifies each query by domain and confidence. Routes to cache if similarity ≥ 0.72, falls through to LLM inference otherwise.

Platform Middleware

Drop-in layer for ChatGPT, Claude, Gemini, Copilot, Llama, and any OpenAI-compatible API. No code changes required.

Energy Analytics

Real-time dashboard tracking Wh saved, CO₂ avoided, cache hit rate, and per-query energy cost across all integrations.

Ready to integrate?

Follow the step-by-step guide and make your first energy-efficient query in under 5 minutes.