OpenAI CompatibleDrop-in replacement

OpenAI-Compatible API

Tapas implements the OpenAI Chat Completions API. If you already use the OpenAI SDK — in Python, Node.js, or via LangChain — you need to change exactly two lines: set base_url to https://tapas.one/v1 and swap your API key. No other code changes required.

The two-line swap

Before (OpenAI)

api_key = "sk-..."base_url = "https://api.openai.com/v1"

After (Tapas)

api_key = "tapas-..."base_url = "https://tapas.one/v1"

Overview

Endpoint

https://tapas.one/v1

Auth header

Authorization: Bearer <key>

Supported method

POST /chat/completions

Supported features: chat completions, streaming (SSE), system/user/assistant message roles, temperature (ignored — Tapas is deterministic), and max_tokens. Function calling and embeddings are not yet supported.

Model Reference

Pass the model name in the model field of your chat completion request. Choose based on your latency and energy budget.

Available models
tapas-lemCache / LEM

0.001 Wh

~40 ms

Returns bullet-point answers from semantic cache. 3000× cheaper than full inference.
tapas-llmFull LLM

~3.0 Wh

~2000 ms

Full GPU inference — use when cache misses or detailed prose is required.
tapas-hybridAuto

0.001–3.0 Wh

40–2000 ms

Smart Router decides automatically — cache if confident (>0.85), LLM otherwise.

Python — Basic usage

pippip install openai
python_basic.py
python
from openai import OpenAI

# ── One-line swap: point the client at Tapas ─────────────────────
client = OpenAI(
    api_key="your-tapas-api-key",   # Your Tapas API key
    base_url="https://tapas.one/v1", # Tapas OpenAI-compatible endpoint
)

# ── Standard chat completion — identical to OpenAI usage ─────────
response = client.chat.completions.create(
    model="tapas-lem",   # "tapas-lem" for cache mode, "tapas-llm" for full inference
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user",   "content": "How does quantum computing work?"},
    ],
)

print(response.choices[0].message.content)
# → "• Quantum computers use qubits instead of classical bits
#    • Qubits exploit superposition to represent 0 and 1 simultaneously
#    ..."

# ── Energy metadata is returned in the response extras ───────────
extras = response.model_extra or {}
print(f"Mode:    {extras.get('tapas_mode', 'unknown')}")
print(f"Saved:   {extras.get('energy_wh_saved', 0):.3f} Wh")
print(f"Latency: {extras.get('response_time_ms', 0)} ms")

Python — Streaming

Streaming works identically to the OpenAI SDK. Use tapas-llm for streaming prose responses; tapas-lem returns the full cached answer in one chunk.

python_streaming.py
python
from openai import OpenAI

client = OpenAI(
    api_key="your-tapas-api-key",
    base_url="https://tapas.one/v1",
)

# ── Streaming response ────────────────────────────────────────────
stream = client.chat.completions.create(
    model="tapas-llm",  # Full inference mode for streaming prose
    messages=[{"role": "user", "content": "Explain CRISPR gene editing."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)
print()  # newline at end

Node.js — Basic usage

npmnpm install openai
node_basic.ts
typescript
import OpenAI from "openai";

// ── One-line swap: point the client at Tapas ─────────────────────
const client = new OpenAI({
  apiKey:  "your-tapas-api-key",
  baseURL: "https://tapas.one/v1",
});

// ── Standard chat completion ──────────────────────────────────────
const response = await client.chat.completions.create({
  model: "tapas-lem",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user",   content: "What is compound interest?" },
  ],
});

console.log(response.choices[0].message.content);
// → "• Compound interest is interest calculated on both the initial
//    principal and the accumulated interest from previous periods..."

Node.js — Streaming

node_streaming.ts
typescript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey:  "your-tapas-api-key",
  baseURL: "https://tapas.one/v1",
});

// ── Streaming response ────────────────────────────────────────────
const stream = await client.chat.completions.create({
  model:    "tapas-llm",
  messages: [{ role: "user", content: "How does TCP/IP work?" }],
  stream:   true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content ?? "";
  process.stdout.write(delta);
}
console.log();

LangChain — Python

Use ChatOpenAI from langchain_openai with Tapas as the backend. Works with chains, agents, RAG pipelines, and any other LangChain abstraction. Requires pip install langchain-openai.

langchain_python.py
python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

# ── Drop-in swap: replace ChatOpenAI base_url ─────────────────────
tapas = ChatOpenAI(
    model="tapas-lem",
    openai_api_key="your-tapas-api-key",
    openai_api_base="https://tapas.one/v1",
    temperature=0,
)

# ── Use exactly as you would ChatOpenAI ──────────────────────────
messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="What causes inflation?"),
]

response = tapas.invoke(messages)
print(response.content)

# ── Works with chains, agents, and RAG pipelines ─────────────────
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

chain = (
    ChatPromptTemplate.from_template("{question}")
    | tapas
    | StrOutputParser()
)

result = chain.invoke({"question": "How does mRNA vaccine technology work?"})
print(result)

LangChain — JavaScript

Use ChatOpenAI from @langchain/openai with the configuration.baseURL override. Requires npm install @langchain/openai @langchain/core.

langchain_js.ts
typescript
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage, SystemMessage } from "@langchain/core/messages";

// ── Drop-in swap: replace configuration ──────────────────────────
const tapas = new ChatOpenAI({
  modelName:    "tapas-lem",
  openAIApiKey: "your-tapas-api-key",
  configuration: {
    baseURL: "https://tapas.one/v1",
  },
  temperature: 0,
});

// ── Use exactly as you would ChatOpenAI ──────────────────────────
const response = await tapas.invoke([
  new SystemMessage("You are a helpful assistant."),
  new HumanMessage("What is the Higgs boson?"),
]);

console.log(response.content);

Migration Guide

Migrating from OpenAI to Tapas takes under 5 minutes. Here is the complete checklist.

Migration checklist
1Generate a Tapas API keyGo to Settings → API Keys → Generate new key
2Update base_url / baseURLSet to https://tapas.one/v1 in your client config
3Replace API keySwap OPENAI_API_KEY for your Tapas key in env vars
4Change model nameUse tapas-lem (cache), tapas-llm (full), or tapas-hybrid (auto)
5Test with a sample queryVerify response.choices[0].message.content is populated
6Monitor energy savingsCheck the Energy Dashboard for live Wh saved metrics

Note on unsupported features: Tapas does not yet support function calling, tool use, embeddings, or fine-tuning. If your app relies on these, usetapas-hybrid for knowledge queries and fall back to OpenAI for tool-calling flows.

Ready to migrate?

Generate your API key, swap two lines, and start saving energy on every query.