OpenAI-Compatible API
Tapas implements the OpenAI Chat Completions API. If you already use the OpenAI SDK — in Python, Node.js, or via LangChain — you need to change exactly two lines: set base_url to https://tapas.one/v1 and swap your API key. No other code changes required.
The two-line swap
Before (OpenAI)
api_key = "sk-..."base_url = "https://api.openai.com/v1"After (Tapas)
api_key = "tapas-..."base_url = "https://tapas.one/v1"Overview
Endpoint
https://tapas.one/v1Auth header
Authorization: Bearer <key>Supported method
POST /chat/completionsSupported features: chat completions, streaming (SSE), system/user/assistant message roles, temperature (ignored — Tapas is deterministic), and max_tokens. Function calling and embeddings are not yet supported.
Model Reference
Pass the model name in the model field of your chat completion request. Choose based on your latency and energy budget.
tapas-lemCache / LEM0.001 Wh
~40 ms
tapas-llmFull LLM~3.0 Wh
~2000 ms
tapas-hybridAuto0.001–3.0 Wh
40–2000 ms
Python — Basic usage
pip install openaifrom openai import OpenAI
# ── One-line swap: point the client at Tapas ─────────────────────
client = OpenAI(
api_key="your-tapas-api-key", # Your Tapas API key
base_url="https://tapas.one/v1", # Tapas OpenAI-compatible endpoint
)
# ── Standard chat completion — identical to OpenAI usage ─────────
response = client.chat.completions.create(
model="tapas-lem", # "tapas-lem" for cache mode, "tapas-llm" for full inference
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "How does quantum computing work?"},
],
)
print(response.choices[0].message.content)
# → "• Quantum computers use qubits instead of classical bits
# • Qubits exploit superposition to represent 0 and 1 simultaneously
# ..."
# ── Energy metadata is returned in the response extras ───────────
extras = response.model_extra or {}
print(f"Mode: {extras.get('tapas_mode', 'unknown')}")
print(f"Saved: {extras.get('energy_wh_saved', 0):.3f} Wh")
print(f"Latency: {extras.get('response_time_ms', 0)} ms")Python — Streaming
Streaming works identically to the OpenAI SDK. Use tapas-llm for streaming prose responses; tapas-lem returns the full cached answer in one chunk.
from openai import OpenAI
client = OpenAI(
api_key="your-tapas-api-key",
base_url="https://tapas.one/v1",
)
# ── Streaming response ────────────────────────────────────────────
stream = client.chat.completions.create(
model="tapas-llm", # Full inference mode for streaming prose
messages=[{"role": "user", "content": "Explain CRISPR gene editing."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)
print() # newline at endNode.js — Basic usage
npm install openaiimport OpenAI from "openai";
// ── One-line swap: point the client at Tapas ─────────────────────
const client = new OpenAI({
apiKey: "your-tapas-api-key",
baseURL: "https://tapas.one/v1",
});
// ── Standard chat completion ──────────────────────────────────────
const response = await client.chat.completions.create({
model: "tapas-lem",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is compound interest?" },
],
});
console.log(response.choices[0].message.content);
// → "• Compound interest is interest calculated on both the initial
// principal and the accumulated interest from previous periods..."Node.js — Streaming
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "your-tapas-api-key",
baseURL: "https://tapas.one/v1",
});
// ── Streaming response ────────────────────────────────────────────
const stream = await client.chat.completions.create({
model: "tapas-llm",
messages: [{ role: "user", content: "How does TCP/IP work?" }],
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content ?? "";
process.stdout.write(delta);
}
console.log();LangChain — Python
Use ChatOpenAI from langchain_openai with Tapas as the backend. Works with chains, agents, RAG pipelines, and any other LangChain abstraction. Requires pip install langchain-openai.
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
# ── Drop-in swap: replace ChatOpenAI base_url ─────────────────────
tapas = ChatOpenAI(
model="tapas-lem",
openai_api_key="your-tapas-api-key",
openai_api_base="https://tapas.one/v1",
temperature=0,
)
# ── Use exactly as you would ChatOpenAI ──────────────────────────
messages = [
SystemMessage(content="You are a helpful assistant."),
HumanMessage(content="What causes inflation?"),
]
response = tapas.invoke(messages)
print(response.content)
# ── Works with chains, agents, and RAG pipelines ─────────────────
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
chain = (
ChatPromptTemplate.from_template("{question}")
| tapas
| StrOutputParser()
)
result = chain.invoke({"question": "How does mRNA vaccine technology work?"})
print(result)LangChain — JavaScript
Use ChatOpenAI from @langchain/openai with the configuration.baseURL override. Requires npm install @langchain/openai @langchain/core.
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage, SystemMessage } from "@langchain/core/messages";
// ── Drop-in swap: replace configuration ──────────────────────────
const tapas = new ChatOpenAI({
modelName: "tapas-lem",
openAIApiKey: "your-tapas-api-key",
configuration: {
baseURL: "https://tapas.one/v1",
},
temperature: 0,
});
// ── Use exactly as you would ChatOpenAI ──────────────────────────
const response = await tapas.invoke([
new SystemMessage("You are a helpful assistant."),
new HumanMessage("What is the Higgs boson?"),
]);
console.log(response.content);Migration Guide
Migrating from OpenAI to Tapas takes under 5 minutes. Here is the complete checklist.
Note on unsupported features: Tapas does not yet support function calling, tool use, embeddings, or fine-tuning. If your app relies on these, usetapas-hybrid for knowledge queries and fall back to OpenAI for tool-calling flows.