Integration Guide

Add Tapas to any AI platform

Tapas is a universal energy-efficiency layer. Use it standalone or as a drop-in add-on for ChatGPT, Claude, Gemini, Llama, or any custom AI stack. One API call is all it takes to start saving energy.

REST API

Call the Tapas API from any language or platform. Returns cached or AI-generated answers with energy metrics.

Zero Config

No vector database to manage. No embedding model to deploy. Tapas handles everything server-side.

Instant Savings

First cache hit saves ~3 Wh. At 1M queries/day with 85% hit rate, that's 164 MWh/year saved.

Quick Start

javascript
// JavaScript / TypeScript SDK
import { TapasClient } from '@tapas-ai/sdk';

const tapas = new TapasClient({
  apiKey: 'your-api-key',
  baseUrl: 'https://tapas.one',
});

// Ask a question with Low Energy Mode
const response = await tapas.ask({
  query: 'How does quantum computing work?',
  lemMode: true, // Enable Low Energy Mode
});

if (response.mode === 'cache') {
  // Served from cache — ~0.001 Wh used
  console.log('Bullets:', response.bullets);
  console.log('Energy saved:', response.energyWhSaved, 'Wh');
} else {
  // Full inference — ~3.0 Wh used
  console.log('Full answer:', response.answer);
}

API Response Reference

mode"cache" | "inference"Whether the answer came from cache or full LLM inference
lemModebooleanWhether Low Energy Mode was active for this query
bulletsstring[]Array of 5-7 concise bullet points (always populated)
answerstring | nullFull prose answer (null in LEM cache-hit mode)
categorystringMatched knowledge category name
categoryDomainstringTop-level domain (e.g. science_technology)
energyWhUsednumberEstimated watt-hours consumed by this query
energyWhSavednumberEstimated watt-hours saved vs. full inference baseline
similarityScorenumber | nullCosine similarity score of cache match (0–1)
responseTimeMsnumberTotal server-side processing time in milliseconds

Platform Compatibility

ChatGPT / OpenAI

Drop-in compatible

Wrap any OpenAI API call with Tapas to add semantic caching. Compatible with GPT-4, GPT-3.5, and custom fine-tuned models.

Anthropic Claude

Native integration

Tapas uses Claude as its primary inference backbone. Integrate Tapas as a pre-processing layer before your Claude API calls.

Google Gemini

Middleware layer

Add Tapas as a middleware layer in your Gemini-powered application. Reduce API costs by up to 85% on repeated queries.

Microsoft Copilot

Enterprise ready

Integrate Tapas into your Copilot Studio workflows to cache common enterprise queries and reduce compute overhead.

Meta Llama

Self-hosted

Self-hosted Llama deployments benefit most from Tapas caching — eliminate GPU inference for repeated queries entirely.

Custom AI Stack

Universal REST API

Tapas exposes a simple REST API. Any AI platform that accepts HTTP requests can use Tapas as an energy-efficient caching layer.

Why this matters at scale

A single ChatGPT query consumes ~3 Wh — 10× more than a Google search. With 1 billion AI queries served daily across all platforms, that's 3,000 MWh per day. Tapas at 85% cache hit rate reduces that to 450 MWh/day — saving enough electricity to power 112,000 homes every single day.