AI Energy Efficiency Layer — Works with any AI platform

Tapas — Spanish for “covers” & small bites. Here, bite-size flavors of AI energy and data, served fast.

AI that thinks before
it computes

Tapas reduces AI energy consumption by up to 99% per query through intelligent question categorization and semantic caching. Use it standalone or plug it into any existing AI platform.

Queries Served

43.8%

Cache Hit Rate

20.99

Wh Saved

8.1

CO₂ Saved (g)

Works as an add-on for any AI platform

Hover each platform for integration details · Click to view the full scope document

How Tapas works

Three layers of intelligence that eliminate unnecessary AI compute

Semantic Categorization

Every query is embedded and matched against 160+ knowledge categories using cosine similarity. No full LLM call needed.

See categories

Intelligent Cache Lookup

If a semantically similar question exists in the cache, the pre-validated answer is returned instantly — bypassing inference entirely.

View cache dashboard

Low Energy Mode

Users opt into LEM to receive concise bullet-point answers from cache. Each session shows real-time energy savings in watt-hours.

Try LEM now

Everything you need

A complete energy-efficiency layer for AI — standalone or embedded

Low Energy Mode

One toggle. Instant bullet-point answers from cache. Up to 99% less energy per query.

Semantic Matching

Cosine similarity matching across 160+ categories. No keyword hacks — real semantic understanding.

Energy Dashboard

Live charts showing Wh saved, cache hit rates, CO₂ equivalent, and 30-day trends.

Category Taxonomy

160+ pre-built categories across Science, Health, Finance, History, Programming, Environment, and General Knowledge.

Universal SDK

REST API + JavaScript/Python SDKs. Plug Tapas into ChatGPT, Claude, Gemini, or any custom AI stack.

Admin Cache Curation

Role-based admin panel to validate, edit, and manage cached responses. Trigger cache warming on demand.

Adaptive Cache Warming

Pre-populate cache with answers to trending queries during off-peak hours — zero user-facing latency impact.

Platform Agnostic

Works with Google Cloud, NVIDIA, AWS, Azure, or on-premise. No vendor lock-in.

CO₂ Tracking

Every session shows real carbon savings. At 85% cache hit rate, Tapas saves ~360 tonnes CO₂/year per 1M daily queries.

160+ Knowledge Categories

Organized across 7 domains. Every question finds its home.

Click a domain to explore its questions in the chat · Hover for sample questions

25Science & TechExplore →

25HealthExplore →

20FinanceExplore →

20HistoryExplore →

25ProgrammingExplore →

20EnvironmentExplore →

25GeneralExplore →

The energy math is undeniable

Energy per full LLM query

~3.0 Wh

10× a Google search

Energy per Tapas cache hit

~0.001 Wh

99.97% less

At 85% cache hit rate

85% reduction

vs. full inference baseline

CO₂ saved (1M queries/day)

~360 tonnes/year

≈ 78 cars off the road

99%

energy reduction per cached query

Works as an add-on for any AI platform

Tapas is not a replacement for your AI — it is an energy-efficiency layer that sits in front of it. One endpoint. Zero refactoring. Immediate savings.

ChatGPT

Drop-in proxy for the OpenAI Chat Completions API. Replace your base URL — all existing code works unchanged.

View scope document

Claude

Native integration — Tapas is built on Anthropic Claude. Inject the Tapas system prompt prefix to enable semantic caching.

View scope document

Gemini

Deploy as a Google Cloud Run sidecar alongside Vertex AI. NVIDIA NIM on GCP is US-only.

View scope document

Copilot

Azure API Management policy injection. Works with Copilot Studio connectors and Azure OpenAI Service.

View scope document

Llama

Docker sidecar in front of Ollama or vLLM. Cache hits skip the GPU entirely — direct energy savings.

View scope document

Any AI

One REST endpoint. Works with any HTTP-capable AI platform — custom models, RAG pipelines, fine-tuned LLMs, or proprietary services.

View scope document

How Tapas plugs in

Built for developers, designed for everyone

One endpoint, zero infrastructure, immediate energy savings.

REST API

Any language · Any platform

Call the Tapas API from any language or platform. Returns cached or AI-generated answers with full energy metrics in every response.

const res = await fetch(
  'https://tapas.one/api/trpc/query.ask',
  {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      json: {
        query: 'How does quantum computing work?',
        lemMode: true
      }
    })
  }
);
const { result } = await res.json();
console.log(result.data.answer);

Zero Config

No infra to manage

No vector database to manage. No embedding model to deploy. Tapas handles everything server-side — you just send queries.

Your app sends a query

POST /api/trpc/query.ask

Tapas classifies & checks cache

Cosine similarity · 160+ categories

Cache hit → instant bullet answer

0.001 Wh · <50ms · no GPU

Cache miss → Claude inference

3.0 Wh · answer stored for next time

Instant Savings

Calculate your impact

First cache hit saves ~3 Wh. Drag the slider to see your organisation's projected annual energy savings.

Queries per day1,000,000

1K1M10M

MWh saved / year930.4 MWh

CO₂ avoided / year216.8 tonnes

Cars off the road≈ 47 cars

At 1M queries/day with 85% hit rate

164 MWh/year saved

≈ 78 cars off the road · 38 tonnes CO₂

Start saving energy today

Use Tapas as your AI assistant or integrate it into your existing platform in minutes.

Product Scope

What Tapas does — and doesn't do

An interactive overview of Tapas's capabilities, architecture, API surface, and roadmap. Click any section to expand.

Tapas is an AI energy efficiency middleware layer that sits between your application and any LLM backend. It reduces energy consumption by up to 99.97% per query through two core mechanisms:

Semantic Cache (LEM)

Stores and retrieves answers for semantically similar questions using cosine similarity. Cache hits return in ~40 ms at 0.001 Wh.

Smart Router

Classifies each query by domain and confidence. Routes to cache if similarity ≥ 0.72, falls through to LLM inference otherwise.

Platform Middleware

Drop-in layer for ChatGPT, Claude, Gemini, Copilot, Llama, and any OpenAI-compatible API. No code changes required.

Energy Analytics

Real-time dashboard tracking Wh saved, CO₂ avoided, cache hit rate, and per-query energy cost across all integrations.

Ready to integrate?

Follow the step-by-step guide and make your first energy-efficient query in under 5 minutes.

AI that thinks beforeit computes

How Tapas works

Semantic Categorization

Intelligent Cache Lookup

Low Energy Mode

Everything you need

Low Energy Mode

Semantic Matching

Energy Dashboard

Category Taxonomy

Universal SDK

Admin Cache Curation

Adaptive Cache Warming

Platform Agnostic

CO₂ Tracking

160+ Knowledge Categories

The energy math is undeniable

Works as an add-on for any AI platform

Built for developers, designed for everyone

REST API

Zero Config

Instant Savings

Start saving energy today

What Tapas does — and doesn't do

Ready to integrate?

AI that thinks before
it computes