Service 03

LLM Router

Route by capability — reasoning-fast, reasoning-deep, vision, code — not by model name. Switch providers without touching app code. Per-tenant overrides. Vendor-outage resilient.


The problem

Every model name in your code is technical debt.

Most agentic codebases litter model names through the codebase: "gpt-4o" here, "claude-sonnet-4-5" there, "gemini-1.5-pro" in the experimentation branch. Every model name is technical debt: when the model is deprecated, when a better one ships, when an enterprise customer demands a different vendor — every reference is a code change.

The deeper problem: model selection is the wrong thing for the application to be aware of. The app cares about what kind of reasoning — fast summary, deep analysis, vision, code completion. The vendor + model is infrastructure concern.


How it works

Capability-token routing with per-tenant overrides and outage graduation.

Cerebe's LLM Router exposes capability tokens instead of model names. The application asks for reasoning-fast or reasoning-deep or vision; the router selects the best available model per capability, factoring in latency targets, cost ceilings, and per-tenant overrides.

When a new model ships and improves a capability, the router upgrades automatically — no application code changes. When a vendor degrades or fully outages, the graduation policy falls back to the next-best model per capability. Per-tenant overrides handle the regulated-customer + multi-cloud cases. Routing decisions are logged for full audit.

  • Capability tokens: reasoning-fast, reasoning-deep, vision, code, long-context, structured-output
  • Router picks the best available model per capability — updated as new models ship
  • Per-tenant overrides: regulated customers can pin models; multi-cloud customers can split by region
  • Vendor outage handling: capability graduates to next-best when primary degrades
  • Routing decisions logged with model, latency, tokens, cost — full audit trail
  • OpenAI, Anthropic, Google, Mistral, open-weights — same SDK call across all
cerebe.llm — capability routing python
# Route by capability — not model name
from cerebe import Cerebe

cb = Cerebe(api_key=...)

# capability:reasoning-fast — router picks best fast-reasoning model
res = cb.llm.complete(
    prompt="Summarize this 3-page note in 2 bullets.",
    route="capability:reasoning-fast",
)
# {
#   "model": "claude-haiku-4.5",      # routed
#   "latency_ms": 320,
#   "tokens": { "in": 1200, "out": 84 },
#   "cost_usd": 0.0034,
# }

# capability:reasoning-deep — heavier work
res = cb.llm.complete(
    prompt="Critique this 12-page treatment plan for safety + efficacy.",
    route="capability:reasoning-deep",
)
# {
#   "model": "claude-opus-4-7",       # routed to deepest available
#   "latency_ms": 4100,
# }

# Per-tenant overrides — Acme Corp insists on GPT-5
cb.llm.set_routing_override(
    tenant_id="acme",
    capability="reasoning-deep",
    model="gpt-5",
)

# Vendor outage handling — graduation policy in router config
# Anthropic down → router falls back to next-best per capability spec
# No app code changes; routing decisions logged for audit

Pricing relevance

LLM Router is core API — included in every plan. LLM passthrough cost flows through cerebe.ai (1.3× markup on Team/Business; BYOK pass-through on Enterprise). Full pricing at cerebe.ai.

Open-source posture

Capability token spec + router config schema are OSS. The vendor-portfolio scoring + outage-graduation policy + per-model calibration are the hosted IP.

Get Started

Switch models without touching app code.

Capability-based routing. Per-tenant overrides. Vendor-outage graduation. The wedge into vendor-neutral agentic infrastructure. Self-signup at cerebe.ai.