Tapas — from the Spanish verb tapar, “to cover”
In Spanish culture, tapas are small shared dishes served at a bar — quick, varied, and satisfying. Legend says they were originally placed as lids over drinks to keep flies out. Over time they became a culinary tradition: bite-size flavors that give you exactly what you need, without the full meal.
Tapas AI works the same way. Instead of spinning up a full AI inference for every question, we serve bite-size answers drawn from a shared semantic cache — covering your query instantly, before a single GPU watt is spent on redundant computation.
“Pequeñas porciones de inteligencia artificial, servidas al instante.”— Small portions of AI, served instantly.
We started from a simple observation: the vast majority of AI queries are near-duplicates of questions that have already been answered. Every time a model re-computes the same answer, it wastes energy. We built Tapas to stop that waste.
Every cache hit saves ~3 Wh of electricity and ~0.7g of CO₂. At scale, that's tonnes of carbon avoided per day.
Cached answers return in under 50ms — 10–100× faster than a live inference call to any major model provider.
Drop Tapas in front of ChatGPT, Claude, Gemini, Llama, or any custom model. No lock-in, no code changes.
A single GPT-4-class inference consumes roughly 3 Wh of electricity. A Tapas cache lookup uses about 0.001 Wh — three thousand times less. At 1 million queries per day with an 85% cache hit rate, that is over 2.5 MWh saved daily.
Live platform stats
Query arrives
A user or application sends a natural-language query to the Tapas API endpoint.
Semantic embedding
The query is converted to a high-dimensional vector using a lightweight embedding model (no GPU required).
Cosine similarity search
The Smart Router compares the query vector against all cached response vectors. If similarity is 0.72 or above, it is a cache hit.
Cache hit — instant answer
The cached answer is returned in under 50ms, consuming ~0.001 Wh. Energy saved.
Cache miss — LLM inference
If no match is found, the query is forwarded to the configured LLM. The answer is stored back into the cache for future hits.
Energy metrics recorded
Every query logs energy used, energy saved, cache hit/miss, category, and routing confidence for the Energy Dashboard.
Identified that 60–80% of AI queries are semantically near-duplicate — a massive opportunity for caching.
First semantic cache with cosine similarity routing. Cache hit rate reached 40% on a test corpus of 10k queries.
Launched the full platform: 160 categories, Smart Router, Energy Dashboard, and community Q&A pipeline.
Released TypeScript, Python, Go, Rust SDKs and OpenAI-compatible drop-in endpoint. API key auth & webhooks added.
Serving queries across science, health, finance, programming, history, environment, and general knowledge domains.
Tapas is a cross-functional team united by a single belief: AI should be powerful and sustainable.
Energy Research
Semantic Cache Architecture
Designed the cosine-similarity routing engine that matches queries to cached answers with sub-millisecond latency.
Platform Engineering
SDK & Integration Layer
Built the TypeScript, Python, Go, and Rust SDKs and the OpenAI-compatible drop-in endpoint.
Data Science
Category Taxonomy & Embeddings
Curated 160+ knowledge categories across 7 domains and trained the embedding model for semantic matching.
Community
Q&A Contributions & Review
Manages the community submission pipeline, quality review workflow, and growing the shared knowledge base.
Every energy metric is calculated from first principles and shown in real time. No greenwashing, no estimates hidden behind dashboards.
The knowledge base grows through community contributions. Every Q&A submission is reviewed and credited.
Tapas works with any AI platform. No vendor lock-in. Drop it in front of any model in minutes.