LLM Router
Route by capability — reasoning-fast, reasoning-deep, vision, code — not by model name. Switch providers without touching app code. Per-tenant overrides. Vendor-outage resilient.
The problem
Every model name in your code is technical debt.
Most agentic codebases litter model names through the codebase:
"gpt-4o" here, "claude-sonnet-4-5" there,
"gemini-1.5-pro" in the experimentation branch. Every model name
is technical debt: when the model is deprecated, when a better one ships,
when an enterprise customer demands a different vendor — every reference is a code
change.
The deeper problem: model selection is the wrong thing for the application to be aware of. The app cares about what kind of reasoning — fast summary, deep analysis, vision, code completion. The vendor + model is infrastructure concern.
How it works
Capability-token routing with per-tenant overrides and outage graduation.
Cerebe's LLM Router exposes capability tokens instead of model
names. The application asks for reasoning-fast or reasoning-deep
or vision; the router selects the best available model per capability,
factoring in latency targets, cost ceilings, and per-tenant overrides.
When a new model ships and improves a capability, the router upgrades automatically — no application code changes. When a vendor degrades or fully outages, the graduation policy falls back to the next-best model per capability. Per-tenant overrides handle the regulated-customer + multi-cloud cases. Routing decisions are logged for full audit.
- Capability tokens: reasoning-fast, reasoning-deep, vision, code, long-context, structured-output
- Router picks the best available model per capability — updated as new models ship
- Per-tenant overrides: regulated customers can pin models; multi-cloud customers can split by region
- Vendor outage handling: capability graduates to next-best when primary degrades
- Routing decisions logged with model, latency, tokens, cost — full audit trail
- OpenAI, Anthropic, Google, Mistral, open-weights — same SDK call across all
# Route by capability — not model name
from cerebe import Cerebe
cb = Cerebe(api_key=...)
# capability:reasoning-fast — router picks best fast-reasoning model
res = cb.llm.complete(
prompt="Summarize this 3-page note in 2 bullets.",
route="capability:reasoning-fast",
)
# {
# "model": "claude-haiku-4.5", # routed
# "latency_ms": 320,
# "tokens": { "in": 1200, "out": 84 },
# "cost_usd": 0.0034,
# }
# capability:reasoning-deep — heavier work
res = cb.llm.complete(
prompt="Critique this 12-page treatment plan for safety + efficacy.",
route="capability:reasoning-deep",
)
# {
# "model": "claude-opus-4-7", # routed to deepest available
# "latency_ms": 4100,
# }
# Per-tenant overrides — Acme Corp insists on GPT-5
cb.llm.set_routing_override(
tenant_id="acme",
capability="reasoning-deep",
model="gpt-5",
)
# Vendor outage handling — graduation policy in router config
# Anthropic down → router falls back to next-best per capability spec
# No app code changes; routing decisions logged for audit Get Started
Switch models without touching app code.
Capability-based routing. Per-tenant overrides. Vendor-outage graduation. The wedge into vendor-neutral agentic infrastructure. Self-signup at cerebe.ai.