Tapas is a universal energy-efficiency layer. Use it standalone or as a drop-in add-on for ChatGPT, Claude, Gemini, Llama, or any custom AI stack. One API call is all it takes to start saving energy.
Call the Tapas API from any language or platform. Returns cached or AI-generated answers with energy metrics.
No vector database to manage. No embedding model to deploy. Tapas handles everything server-side.
First cache hit saves ~3 Wh. At 1M queries/day with 85% hit rate, that's 164 MWh/year saved.
// JavaScript / TypeScript SDK
import { TapasClient } from '@tapas-ai/sdk';
const tapas = new TapasClient({
apiKey: 'your-api-key',
baseUrl: 'https://tapas.one',
});
// Ask a question with Low Energy Mode
const response = await tapas.ask({
query: 'How does quantum computing work?',
lemMode: true, // Enable Low Energy Mode
});
if (response.mode === 'cache') {
// Served from cache — ~0.001 Wh used
console.log('Bullets:', response.bullets);
console.log('Energy saved:', response.energyWhSaved, 'Wh');
} else {
// Full inference — ~3.0 Wh used
console.log('Full answer:', response.answer);
}mode"cache" | "inference"Whether the answer came from cache or full LLM inferencelemModebooleanWhether Low Energy Mode was active for this querybulletsstring[]Array of 5-7 concise bullet points (always populated)answerstring | nullFull prose answer (null in LEM cache-hit mode)categorystringMatched knowledge category namecategoryDomainstringTop-level domain (e.g. science_technology)energyWhUsednumberEstimated watt-hours consumed by this queryenergyWhSavednumberEstimated watt-hours saved vs. full inference baselinesimilarityScorenumber | nullCosine similarity score of cache match (0–1)responseTimeMsnumberTotal server-side processing time in millisecondsWrap any OpenAI API call with Tapas to add semantic caching. Compatible with GPT-4, GPT-3.5, and custom fine-tuned models.
Tapas uses Claude as its primary inference backbone. Integrate Tapas as a pre-processing layer before your Claude API calls.
Add Tapas as a middleware layer in your Gemini-powered application. Reduce API costs by up to 85% on repeated queries.
Integrate Tapas into your Copilot Studio workflows to cache common enterprise queries and reduce compute overhead.
Self-hosted Llama deployments benefit most from Tapas caching — eliminate GPU inference for repeated queries entirely.
Tapas exposes a simple REST API. Any AI platform that accepts HTTP requests can use Tapas as an energy-efficient caching layer.
A single ChatGPT query consumes ~3 Wh — 10× more than a Google search. With 1 billion AI queries served daily across all platforms, that's 3,000 MWh per day. Tapas at 85% cache hit rate reduces that to 450 MWh/day — saving enough electricity to power 112,000 homes every single day.