Intelligent Prompt Classification — Scales with your index

Smart Router

As your category index grows beyond hundreds into thousands of topics, cosine similarity alone is no longer enough. The Smart Router is a lightweight LLM classification layer that reads the user's intent and routes queries with surgical precision — at 300× less energy than full inference.

160
Current Categories
340 until Smart Router
Cosine
Routing Mode
Fast, in-memory
0.0001 Wh
Classification Energy
per query routed
30,000×
vs. Full Inference
more efficient

Grow the Index

Each community Q&A submission expands the category index. At 200 categories the Hybrid Router activates; at 500 the Smart Router takes over.

Cosine Router (current)Hybrid Router at 200
160 / 200 categories40 to go
Hybrid RouterSmart Router at 500
160 / 500 categories340 to go

Routing Modes

Hybrid Router

Cosine similarity runs first. If it returns a high-confidence match (> 85%), the Smart Router is skipped entirely. Only ambiguous or low-confidence queries are escalated to the LLM classifier. Best of both worlds.

0.0001–0.01 Wh
per classification

How it works

  1. 1Step 1: Run cosine similarity (< 5ms, free)
  2. 2Step 2: If cosine confidence > 85% → use cache match directly
  3. 3Step 3: If cosine confidence 60–85% → escalate to Smart Router LLM
  4. 4Step 4: Smart Router returns structured classification
  5. 5Step 5: If Smart Router confidence > 60% → cache lookup
  6. 6Step 6: If all fail → full LLM inference (Anthropic Claude)

Advantages

  • Near-zero latency for well-known queries (cosine handles ~70% of traffic)
  • Smart Router only activates for the hard 30%
  • Scales to unlimited categories
  • Graceful degradation: each layer falls through to the next

Limitations

  • Slightly more complex routing logic
  • Requires Smart Router model to be available

Test the Router Live

Enter any question to see how Tapas classifies it — domain, category, confidence score, routing method used, and energy consumed.

Smart Router Model Options

When the Smart Router activates, choose the classification model that fits your scale, latency budget, and infrastructure.

Claude Haiku 3.5Recommended
Anthropic
~80ms
Latency
$0.25
Cost/1M tokens
200K tokens
Context
Available globally via Anthropic API.
Gemini 2.0 Flash
Google Cloud
~60ms
Latency
$0.10
Cost/1M tokens
1M tokens
Context
Vertex AI: us-central1, europe-west4, asia-southeast1.
Llama 3.2 1B (local)
NVIDIA NIM / Ollama
~120ms
Latency
$0 (self-hosted)
Cost/1M tokens
128K tokens
Context
NVIDIA NIM: US and select EU regions only. Ollama: no restrictions.
Mistral 7B (fine-tuned)
Custom / HuggingFace
~200ms
Latency
Variable
Cost/1M tokens
32K tokens
Context
Deploy anywhere — HuggingFace Inference Endpoints, AWS, GCP, Azure.

Smart Router System Prompt

The Smart Router uses a compact, structured system prompt that forces the model to return a JSON classification — not prose. This keeps token usage minimal and latency low.

Smart Router System Prompt (JSON Schema mode)
~120 tokens
You are a query classification engine for an AI energy-efficiency cache.
Given a user question, return ONLY valid JSON matching this schema:

{
  "domain": "<one of: science_technology | health_medicine | finance_economics | history_culture | programming_dev | environment_climate | general_knowledge>",
  "category": "<most specific matching category name>",
  "confidence": <float 0.0–1.0>,
  "subIntent": "<brief phrase describing what the user wants to know>",
  "ambiguous": <true if query could belong to multiple domains>
}

Rules:
- domain and category MUST match the provided taxonomy exactly
- confidence reflects how certain you are of the classification
- If confidence < 0.6, set ambiguous: true
- Return ONLY the JSON object, no explanation

Scaling Roadmap

Cosine Similarity0 – 500 categoriesActive now

In-memory TF-IDF vectors. No external dependencies. Handles all of Tapas's current 160 categories with sub-5ms classification.

Hybrid Router500 – 5,000 categoriesv2.0

Cosine handles high-confidence queries. Smart Router (Claude Haiku or Gemini Flash) handles ambiguous ones. ~70% of traffic never touches the LLM classifier.

Fine-tuned Classifier5,000 – 50,000 categoriesv3.0

A domain-specific fine-tuned model (Llama 3.2 1B or Mistral 7B) trained on the full Tapas category taxonomy. Sub-100ms, zero API cost, runs on a single NVIDIA T4.

Hierarchical Router50,000+ categoriesVision

Two-stage routing: first classify into a top-level domain (7 options, near-instant), then run a domain-specific sub-classifier. Scales to millions of categories with constant latency.