Integration Guide~15 min read

Integration Guide

A complete, step-by-step walkthrough for integrating Tapas into any application. By the end you will have a working integration that routes queries through the semantic cache, handles errors gracefully, and streams responses to your UI — saving up to 99% of LLM energy per query.

Install
Auth
Query
Cache
Platform
Errors
Stream

Overview

Tapas sits between your application and any LLM backend. When a query arrives, Tapas checks its semantic cache first. If a similar question has been answered before (cosine similarity ≥ 0.72), it returns the cached answer in ~40 ms using ~0.001 Wh. Only on a cache miss does it fall through to full LLM inference (~2 s, ~3.0 Wh).

Request flow
Your AppTapas SDKSemantic Cache→ hit →Response (40ms)
→ miss →LLM BackendResponse (2s)

~40 ms

Cache hit latency

0.001 Wh

Cache hit energy

up to 99.97%

Energy savings

1

Install the SDK

1 minute

Choose the SDK that matches your language. All SDKs share the same API surface and response shape.

TypeScript / Node.js

npm install @tapas-ai/sdk

Python

pip install tapas-ai

OpenAI SDK (drop-in)

npm/pip install openai
Install TypeScript SDK
bash
npm install @tapas-ai/sdk
# or
pnpm add @tapas-ai/sdk
Install Python SDK
bash
pip install tapas-ai
Tip: If you already use the OpenAI SDK, you can skip installing a new package. Just change base_url to https://tapas.one/v1 — see Step 5.
2

Get your API Key

2 minutes

Navigate to Settings → API Keys and click Generate new key. Copy the key immediately — it is shown only once. Store it as an environment variable; never commit it to source control.

TypeScript — initialise client
typescript
// TypeScript — environment variable (recommended)
import { TapasClient } from "@tapas-ai/sdk";

const tapas = new TapasClient({
  apiKey: process.env.TAPAS_API_KEY,  // never hardcode
  baseUrl: "https://tapas.one/v1",    // optional — default
});
Python — initialise client
python
# Python — environment variable (recommended)
import os
from tapas_ai import TapasClient

tapas = TapasClient(
    api_key=os.environ["TAPAS_API_KEY"],  # never hardcode
    base_url="https://tapas.one/v1",      # optional — default
)
Warning: Never hardcode API keys in source files. Use process.env.TAPAS_API_KEY (Node.js) or os.environ["TAPAS_API_KEY"] (Python) and add the key to your CI/CD secrets.
3

Make your first query

2 minutes

Call tapas.query() with any natural-language question. The response includes the answer plus energy and latency metadata.

TypeScript — first query
typescript
const result = await tapas.query("What is compound interest?");

console.log(result.answer);
// → "• Compound interest is interest calculated on both principal
//    and accumulated interest from previous periods.
//    • Formula: A = P(1 + r/n)^(nt)
//    • Example: $1,000 at 5% annually → $1,628.89 after 10 years"

console.log(result.cacheHit);      // true  — served from cache
console.log(result.energyWhUsed);  // 0.001 — vs 3.0 Wh for full LLM
console.log(result.energyWhSaved); // 2.999 — saved this query
console.log(result.responseTimeMs);// 42    — milliseconds
Python — first query
python
result = tapas.query("What is compound interest?")

print(result.answer)
# → "• Compound interest is interest calculated on both principal..."

print(f"Cache hit:    {result.cache_hit}")       # True
print(f"Energy used:  {result.energy_wh_used} Wh") # 0.001
print(f"Energy saved: {result.energy_wh_saved} Wh") # 2.999
print(f"Latency:      {result.response_time_ms} ms") # 42
Note: The first call to a new question will be a cache miss (~2 s). Subsequent calls to the same or semantically similar question will hit the cache (~40 ms). Tapas automatically warms the cache as queries accumulate.
4

Understand the cache response

3 minutes

Every TapasResponse includes cache metadata you can use to build energy dashboards, feedback loops, and adaptive UIs.

TapasResponse — full shape
typescript
// TapasResponse shape
interface TapasResponse {
  answer:          string;   // bullet-point answer (LEM mode) or prose (LLM mode)
  cacheHit:        boolean;  // true = served from semantic cache
  similarityScore: number;   // 0.0–1.0 cosine similarity to cached question
  mode:            "lem" | "llm" | "hybrid";
  energyWhUsed:    number;   // actual Wh consumed
  energyWhSaved:   number;   // vs full LLM baseline (3.0 Wh)
  responseTimeMs:  number;   // end-to-end latency
  categoryId?:     number;   // matched knowledge category, if any
  queryId:         string;   // unique ID for feedback / logging
}

// Checking cache behavior
if (result.cacheHit) {
  console.log(`Cache hit! Similarity: ${(result.similarityScore * 100).toFixed(1)}%`);
  console.log(`Saved ${result.energyWhSaved.toFixed(3)} Wh this query`);
} else {
  console.log("Cache miss — full LLM inference used");
}
Mode comparison
LEM (cache hit)0.001 Wh~40 ms≥ 0.72Bullet-point answer from semantic cache
LLM (cache miss)~3.0 Wh~2,000 ms< 0.72Full GPU inference — prose answer
Hybrid (auto)0.001–3.0 Wh40–2000 msAutoSmart Router decides based on confidence
5

Add as platform middleware

3 minutes

Tapas works as a drop-in middleware layer for any AI platform — Express, FastAPI, LangChain, or any service using the OpenAI SDK. Two integration patterns are shown below.

Express middleware — TypeScript
typescript
// Express middleware — intercept any LLM call and route through Tapas
import express from "express";
import { TapasClient } from "@tapas-ai/sdk";

const tapas = new TapasClient({ apiKey: process.env.TAPAS_API_KEY });
const app = express();
app.use(express.json());

// Drop-in middleware for any /chat endpoint
app.post("/chat", async (req, res) => {
  const { message } = req.body;

  // Route through Tapas first — cache hit returns in ~40ms
  const result = await tapas.query(message);

  res.json({
    reply:       result.answer,
    source:      result.cacheHit ? "tapas-cache" : "tapas-llm",
    energySaved: result.energyWhSaved,
    latencyMs:   result.responseTimeMs,
  });
});
OpenAI SDK drop-in — Python
python
from openai import OpenAI

# ── Swap two lines — all existing code works unchanged ────────────
client = OpenAI(
    api_key  = "your-tapas-api-key",
    base_url = "https://tapas.one/v1",   # ← only change needed
)

response = client.chat.completions.create(
    model    = "tapas-hybrid",  # auto: cache if confident, LLM otherwise
    messages = [{"role": "user", "content": "What is the Higgs boson?"}],
)
print(response.choices[0].message.content)
Tip: For LangChain users, replace ChatOpenAI withopenai_api_base="https://tapas.one/v1". See the OpenAI Compatible docs for full examples.
6

Handle errors gracefully

2 minutes

Tapas throws typed error classes so you can handle auth failures, rate limits, and network errors with specific recovery logic.

Error handling — TypeScript
typescript
import { TapasError, TapasRateLimitError, TapasAuthError } from "@tapas-ai/sdk";

try {
  const result = await tapas.query("What is quantum entanglement?");
  console.log(result.answer);
} catch (err) {
  if (err instanceof TapasAuthError) {
    // 401 — invalid or revoked API key
    console.error("Invalid API key — check TAPAS_API_KEY env var");
  } else if (err instanceof TapasRateLimitError) {
    // 429 — rate limit exceeded
    const retryAfter = err.retryAfterSeconds;
    console.warn(`Rate limited — retry after ${retryAfter}s`);
    await new Promise(r => setTimeout(r, retryAfter * 1000));
  } else if (err instanceof TapasError) {
    // Other Tapas API errors
    console.error(`Tapas error ${err.statusCode}: ${err.message}`);
  } else {
    // Network or unexpected error
    throw err;
  }
}
Error handling — Python
python
from tapas_ai import TapasClient, TapasError, TapasRateLimitError, TapasAuthError
import time

tapas = TapasClient(api_key=os.environ["TAPAS_API_KEY"])

try:
    result = tapas.query("What is quantum entanglement?")
    print(result.answer)
except TapasAuthError:
    print("Invalid API key — check TAPAS_API_KEY env var")
except TapasRateLimitError as e:
    print(f"Rate limited — retrying after {e.retry_after_seconds}s")
    time.sleep(e.retry_after_seconds)
except TapasError as e:
    print(f"Tapas error {e.status_code}: {e.message}")
Error reference
401TapasAuthErrorInvalid or revoked API key
429TapasRateLimitErrorRate limit exceeded — check err.retryAfterSeconds
400TapasValidationErrorInvalid request — empty query or bad parameters
503TapasUnavailableErrorService temporarily unavailable — retry with backoff
5xxTapasErrorGeneric server error — check err.statusCode
7

Stream responses to your UI

2 minutes

Use tapas.stream() for prose responses that benefit from progressive rendering. Streaming uses the tapas-llm model; cache hits return the full answer in a single chunk.

Streaming — TypeScript
typescript
// Streaming — use tapas-llm model for prose responses
const stream = await tapas.stream("Explain how mRNA vaccines work.");

for await (const chunk of stream) {
  process.stdout.write(chunk.delta);  // stream text to UI
}
// chunk.delta — incremental text
// chunk.done  — true on final chunk
// chunk.energyWhUsed — available on final chunk
Streaming — Python async
python
# Async streaming with httpx
import asyncio
from tapas_ai import AsyncTapasClient

async def main():
    tapas = AsyncTapasClient(api_key=os.environ["TAPAS_API_KEY"])
    async for chunk in tapas.stream("Explain how mRNA vaccines work."):
        print(chunk.delta, end="", flush=True)
    print()

asyncio.run(main())

SDK Comparison

All SDKs at a glance
SDKInstallTypesStreamLangChainAsyncDocs
TypeScript / Node.jsnpm install @tapas-ai/sdkFullView →
Pythonpip install tapas-aiFullView →
OpenAI SDK (any lang)npm/pip install openaiPartialView →
Gogo get tapas.one/sdkStructView →
Rusttapas-sdk = "0.1"StructView →
REST / cURLNo installView →

Next Steps

Ready to integrate?

Generate your API key and make your first energy-efficient query in under 5 minutes.