Our Story

Why we built Tapas

🇬🇧

Tapas — from the Spanish verb tapar, “to cover”

In Spanish culture, tapas are small shared dishes served at a bar — quick, varied, and satisfying. Legend says they were originally placed as lids over drinks to keep flies out. Over time they became a culinary tradition: bite-size flavors that give you exactly what you need, without the full meal.

Tapas AI works the same way. Instead of spinning up a full AI inference for every question, we serve bite-size answers drawn from a shared semantic cache — covering your query instantly, before a single GPU watt is spent on redundant computation.

“Pequeñas porciones de inteligencia artificial, servidas al instante.”— Small portions of AI, served instantly.

We started from a simple observation: the vast majority of AI queries are near-duplicates of questions that have already been answered. Every time a model re-computes the same answer, it wastes energy. We built Tapas to stop that waste.

Reduce AI's carbon footprint

Every cache hit saves ~3 Wh of electricity and ~0.7g of CO₂. At scale, that's tonnes of carbon avoided per day.

Make AI faster for everyone

Cached answers return in under 50ms — 10–100× faster than a live inference call to any major model provider.

Work with any AI platform

Drop Tapas in front of ChatGPT, Claude, Gemini, Llama, or any custom model. No lock-in, no code changes.

The Energy Math

Why caching changes everything

A single GPT-4-class inference consumes roughly 3 Wh of electricity. A Tapas cache lookup uses about 0.001 Wh — three thousand times less. At 1 million queries per day with an 85% cache hit rate, that is over 2.5 MWh saved daily.

~3 Wh
Full AI inference
per query (GPT-4 class)
~0.001 Wh
Tapas cache hit
3,000× less energy
99%
Energy reduction
on cache-hit queries
0.233 kg
CO₂ per kWh
global grid average

Live platform stats

16
Queries served
43.8%
Cache hit rate
20.99
Wh saved
8.1
CO₂ saved (g)
Architecture

How Tapas works under the hood

1

Query arrives

A user or application sends a natural-language query to the Tapas API endpoint.

2

Semantic embedding

The query is converted to a high-dimensional vector using a lightweight embedding model (no GPU required).

3

Cosine similarity search

The Smart Router compares the query vector against all cached response vectors. If similarity is 0.72 or above, it is a cache hit.

4

Cache hit — instant answer

The cached answer is returned in under 50ms, consuming ~0.001 Wh. Energy saved.

5

Cache miss — LLM inference

If no match is found, the query is forwarded to the configured LLM. The answer is stored back into the cache for future hits.

6

Energy metrics recorded

Every query logs energy used, energy saved, cache hit/miss, category, and routing confidence for the Energy Dashboard.

Timeline

How we got here

2023Concept

Identified that 60–80% of AI queries are semantically near-duplicate — a massive opportunity for caching.

2024 Q1Prototype

First semantic cache with cosine similarity routing. Cache hit rate reached 40% on a test corpus of 10k queries.

2024 Q3Platform

Launched the full platform: 160 categories, Smart Router, Energy Dashboard, and community Q&A pipeline.

2025SDK Layer

Released TypeScript, Python, Go, Rust SDKs and OpenAI-compatible drop-in endpoint. API key auth & webhooks added.

NowScale

Serving queries across science, health, finance, programming, history, environment, and general knowledge domains.

Team

Built by people who care about energy

Tapas is a cross-functional team united by a single belief: AI should be powerful and sustainable.

Energy Research

Semantic Cache Architecture

Designed the cosine-similarity routing engine that matches queries to cached answers with sub-millisecond latency.

Platform Engineering

SDK & Integration Layer

Built the TypeScript, Python, Go, and Rust SDKs and the OpenAI-compatible drop-in endpoint.

Data Science

Category Taxonomy & Embeddings

Curated 160+ knowledge categories across 7 domains and trained the embedding model for semantic matching.

Community

Q&A Contributions & Review

Manages the community submission pipeline, quality review workflow, and growing the shared knowledge base.

Values

What we believe in

Transparency

Every energy metric is calculated from first principles and shown in real time. No greenwashing, no estimates hidden behind dashboards.

Community-first

The knowledge base grows through community contributions. Every Q&A submission is reviewed and credited.

Open integration

Tapas works with any AI platform. No vendor lock-in. Drop it in front of any model in minutes.

Ready to serve your first tapa?

Start with the chat interface, plug Tapas into your existing AI stack, or browse the 160+ knowledge categories.