Integration Guide
A complete, step-by-step walkthrough for integrating Tapas into any application. By the end you will have a working integration that routes queries through the semantic cache, handles errors gracefully, and streams responses to your UI — saving up to 99% of LLM energy per query.
Overview
Tapas sits between your application and any LLM backend. When a query arrives, Tapas checks its semantic cache first. If a similar question has been answered before (cosine similarity ≥ 0.72), it returns the cached answer in ~40 ms using ~0.001 Wh. Only on a cache miss does it fall through to full LLM inference (~2 s, ~3.0 Wh).
~40 ms
Cache hit latency
0.001 Wh
Cache hit energy
up to 99.97%
Energy savings
Install the SDK
1 minute
Choose the SDK that matches your language. All SDKs share the same API surface and response shape.
TypeScript / Node.js
npm install @tapas-ai/sdkPython
pip install tapas-aiOpenAI SDK (drop-in)
npm/pip install openainpm install @tapas-ai/sdk # or pnpm add @tapas-ai/sdk
pip install tapas-ai
base_url to https://tapas.one/v1 — see Step 5.Get your API Key
2 minutes
Navigate to Settings → API Keys and click Generate new key. Copy the key immediately — it is shown only once. Store it as an environment variable; never commit it to source control.
// TypeScript — environment variable (recommended)
import { TapasClient } from "@tapas-ai/sdk";
const tapas = new TapasClient({
apiKey: process.env.TAPAS_API_KEY, // never hardcode
baseUrl: "https://tapas.one/v1", // optional — default
});# Python — environment variable (recommended)
import os
from tapas_ai import TapasClient
tapas = TapasClient(
api_key=os.environ["TAPAS_API_KEY"], # never hardcode
base_url="https://tapas.one/v1", # optional — default
)process.env.TAPAS_API_KEY (Node.js) or os.environ["TAPAS_API_KEY"] (Python) and add the key to your CI/CD secrets.Make your first query
2 minutes
Call tapas.query() with any natural-language question. The response includes the answer plus energy and latency metadata.
const result = await tapas.query("What is compound interest?");
console.log(result.answer);
// → "• Compound interest is interest calculated on both principal
// and accumulated interest from previous periods.
// • Formula: A = P(1 + r/n)^(nt)
// • Example: $1,000 at 5% annually → $1,628.89 after 10 years"
console.log(result.cacheHit); // true — served from cache
console.log(result.energyWhUsed); // 0.001 — vs 3.0 Wh for full LLM
console.log(result.energyWhSaved); // 2.999 — saved this query
console.log(result.responseTimeMs);// 42 — millisecondsresult = tapas.query("What is compound interest?")
print(result.answer)
# → "• Compound interest is interest calculated on both principal..."
print(f"Cache hit: {result.cache_hit}") # True
print(f"Energy used: {result.energy_wh_used} Wh") # 0.001
print(f"Energy saved: {result.energy_wh_saved} Wh") # 2.999
print(f"Latency: {result.response_time_ms} ms") # 42Understand the cache response
3 minutes
Every TapasResponse includes cache metadata you can use to build energy dashboards, feedback loops, and adaptive UIs.
// TapasResponse shape
interface TapasResponse {
answer: string; // bullet-point answer (LEM mode) or prose (LLM mode)
cacheHit: boolean; // true = served from semantic cache
similarityScore: number; // 0.0–1.0 cosine similarity to cached question
mode: "lem" | "llm" | "hybrid";
energyWhUsed: number; // actual Wh consumed
energyWhSaved: number; // vs full LLM baseline (3.0 Wh)
responseTimeMs: number; // end-to-end latency
categoryId?: number; // matched knowledge category, if any
queryId: string; // unique ID for feedback / logging
}
// Checking cache behavior
if (result.cacheHit) {
console.log(`Cache hit! Similarity: ${(result.similarityScore * 100).toFixed(1)}%`);
console.log(`Saved ${result.energyWhSaved.toFixed(3)} Wh this query`);
} else {
console.log("Cache miss — full LLM inference used");
}Add as platform middleware
3 minutes
Tapas works as a drop-in middleware layer for any AI platform — Express, FastAPI, LangChain, or any service using the OpenAI SDK. Two integration patterns are shown below.
// Express middleware — intercept any LLM call and route through Tapas
import express from "express";
import { TapasClient } from "@tapas-ai/sdk";
const tapas = new TapasClient({ apiKey: process.env.TAPAS_API_KEY });
const app = express();
app.use(express.json());
// Drop-in middleware for any /chat endpoint
app.post("/chat", async (req, res) => {
const { message } = req.body;
// Route through Tapas first — cache hit returns in ~40ms
const result = await tapas.query(message);
res.json({
reply: result.answer,
source: result.cacheHit ? "tapas-cache" : "tapas-llm",
energySaved: result.energyWhSaved,
latencyMs: result.responseTimeMs,
});
});from openai import OpenAI
# ── Swap two lines — all existing code works unchanged ────────────
client = OpenAI(
api_key = "your-tapas-api-key",
base_url = "https://tapas.one/v1", # ← only change needed
)
response = client.chat.completions.create(
model = "tapas-hybrid", # auto: cache if confident, LLM otherwise
messages = [{"role": "user", "content": "What is the Higgs boson?"}],
)
print(response.choices[0].message.content)ChatOpenAI withopenai_api_base="https://tapas.one/v1". See the OpenAI Compatible docs for full examples.Handle errors gracefully
2 minutes
Tapas throws typed error classes so you can handle auth failures, rate limits, and network errors with specific recovery logic.
import { TapasError, TapasRateLimitError, TapasAuthError } from "@tapas-ai/sdk";
try {
const result = await tapas.query("What is quantum entanglement?");
console.log(result.answer);
} catch (err) {
if (err instanceof TapasAuthError) {
// 401 — invalid or revoked API key
console.error("Invalid API key — check TAPAS_API_KEY env var");
} else if (err instanceof TapasRateLimitError) {
// 429 — rate limit exceeded
const retryAfter = err.retryAfterSeconds;
console.warn(`Rate limited — retry after ${retryAfter}s`);
await new Promise(r => setTimeout(r, retryAfter * 1000));
} else if (err instanceof TapasError) {
// Other Tapas API errors
console.error(`Tapas error ${err.statusCode}: ${err.message}`);
} else {
// Network or unexpected error
throw err;
}
}from tapas_ai import TapasClient, TapasError, TapasRateLimitError, TapasAuthError
import time
tapas = TapasClient(api_key=os.environ["TAPAS_API_KEY"])
try:
result = tapas.query("What is quantum entanglement?")
print(result.answer)
except TapasAuthError:
print("Invalid API key — check TAPAS_API_KEY env var")
except TapasRateLimitError as e:
print(f"Rate limited — retrying after {e.retry_after_seconds}s")
time.sleep(e.retry_after_seconds)
except TapasError as e:
print(f"Tapas error {e.status_code}: {e.message}")401TapasAuthErrorInvalid or revoked API key429TapasRateLimitErrorRate limit exceeded — check err.retryAfterSeconds400TapasValidationErrorInvalid request — empty query or bad parameters503TapasUnavailableErrorService temporarily unavailable — retry with backoff5xxTapasErrorGeneric server error — check err.statusCodeStream responses to your UI
2 minutes
Use tapas.stream() for prose responses that benefit from progressive rendering. Streaming uses the tapas-llm model; cache hits return the full answer in a single chunk.
// Streaming — use tapas-llm model for prose responses
const stream = await tapas.stream("Explain how mRNA vaccines work.");
for await (const chunk of stream) {
process.stdout.write(chunk.delta); // stream text to UI
}
// chunk.delta — incremental text
// chunk.done — true on final chunk
// chunk.energyWhUsed — available on final chunk# Async streaming with httpx
import asyncio
from tapas_ai import AsyncTapasClient
async def main():
tapas = AsyncTapasClient(api_key=os.environ["TAPAS_API_KEY"])
async for chunk in tapas.stream("Explain how mRNA vaccines work."):
print(chunk.delta, end="", flush=True)
print()
asyncio.run(main())SDK Comparison
Next Steps
Try the live demo
Ask a question and see cache vs LLM in real time
Energy Dashboard
Monitor your energy savings and cache hit rate
TypeScript SDK docs
Full API reference, hooks, and advanced patterns
OpenAI Compatible API
Drop-in swap for existing OpenAI-based apps
Platform Integrations
ChatGPT, Claude, Gemini, and custom AI stacks
Generate API Key
Create a key and start integrating today