As your category index grows beyond hundreds into thousands of topics, cosine similarity alone is no longer enough. The Smart Router is a lightweight LLM classification layer that reads the user's intent and routes queries with surgical precision — at 300× less energy than full inference.
Each community Q&A submission expands the category index. At 200 categories the Hybrid Router activates; at 500 the Smart Router takes over.
Cosine similarity runs first. If it returns a high-confidence match (> 85%), the Smart Router is skipped entirely. Only ambiguous or low-confidence queries are escalated to the LLM classifier. Best of both worlds.
How it works
Advantages
Limitations
Enter any question to see how Tapas classifies it — domain, category, confidence score, routing method used, and energy consumed.
When the Smart Router activates, choose the classification model that fits your scale, latency budget, and infrastructure.
The Smart Router uses a compact, structured system prompt that forces the model to return a JSON classification — not prose. This keeps token usage minimal and latency low.
You are a query classification engine for an AI energy-efficiency cache.
Given a user question, return ONLY valid JSON matching this schema:
{
"domain": "<one of: science_technology | health_medicine | finance_economics | history_culture | programming_dev | environment_climate | general_knowledge>",
"category": "<most specific matching category name>",
"confidence": <float 0.0–1.0>,
"subIntent": "<brief phrase describing what the user wants to know>",
"ambiguous": <true if query could belong to multiple domains>
}
Rules:
- domain and category MUST match the provided taxonomy exactly
- confidence reflects how certain you are of the classification
- If confidence < 0.6, set ambiguous: true
- Return ONLY the JSON object, no explanationIn-memory TF-IDF vectors. No external dependencies. Handles all of Tapas's current 160 categories with sub-5ms classification.
Cosine handles high-confidence queries. Smart Router (Claude Haiku or Gemini Flash) handles ambiguous ones. ~70% of traffic never touches the LLM classifier.
A domain-specific fine-tuned model (Llama 3.2 1B or Mistral 7B) trained on the full Tapas category taxonomy. Sub-100ms, zero API cost, runs on a single NVIDIA T4.
Two-stage routing: first classify into a top-level domain (7 options, near-instant), then run a domain-specific sub-classifier. Scales to millions of categories with constant latency.
We use cookies
We use essential cookies to keep you signed in and optional analytics cookies to understand how Tapas is used — no advertising, no third-party tracking.
Essential
Always on — required for login & security
Analytics
Optional — anonymous usage stats