Service 04

Prompt Versioning

YAML-managed prompts. Semver. A/B canary evaluation. Auto-rollback on quality regression. The prompt-as-code workflow that catches drift before it ships.


The problem

Prompt changes are silent. No version control, no eval, no rollback.

Most teams treat prompts as uncommitted source code: a senior engineer edits a Python string, ships the change, hopes for the best. There's no diff to review, no evaluation against historical data, no rollback when quality regresses. The team learns about the regression from user complaints two weeks later.

For agentic apps where prompt quality is the product, this is a fundamental governance gap. Prompts deserve the same version control and quality gating that application code gets.


How it works

YAML prompts + LangSmith eval + canary variants + auto-rollback.

Prompts live as YAML files with semver, declaring their inputs, system + user templates, and their eval suite. Each version is canary-deployed with weighted traffic split (e.g., 20% new variant, 80% prior baseline).

LangSmith runs the eval suite on every variant against a held-out dataset. Metrics — tone, factuality, brevity, custom rubrics — are computed continuously. If any metric regresses 5%+ against the prior baseline, auto-rollback fires immediately and the prior version takes 100% of traffic. The app code never references prompt text directly — only cerebe.prompts.render(name=..., inputs=...) by name + version.

  • Prompts in YAML files with semver — versionable, diff-able, code-reviewable
  • Each prompt declares its eval suite + variants with weighted traffic split
  • LangSmith integration runs evals on every variant against a held-out dataset
  • Auto-rollback fires when a metric regresses 5%+ against the prior baseline
  • Canary deploys: 20% to the new variant, 80% to baseline, until eval confidence is high
  • App code references prompt by name + version — no prompt text in the codebase
prompts/coaching/encouragement.yaml yaml
# prompts/coaching/encouragement.yaml — versioned, semver, eval-gated
name: coaching/encouragement
version: 2.1.0
description: Empathetic encouragement after a missed-session signal.
inputs:
  - name: user_name
    type: string
  - name: missed_days
    type: integer

system: |
  You are a supportive movement coach. Tone: warm, non-judgmental, brief.
  Reference the user's stated preferences from memory; do not invent facts.

user: |
  {{ user_name }} missed their last {{ missed_days }} scheduled sessions.
  Encourage re-engagement.

evaluation:
  suite: coaching-tone
  variants:
    - id: v2.1.0
      weight: 0.80
    - id: v2.0.0
      weight: 0.20      # canary; rollback target
  metrics:
    - tone-empathy >= 0.85
    - factuality   >= 0.95
    - brevity      <= 80_tokens
  auto_rollback: true   # if any metric regresses 5%+, revert to prior version

# Usage in app code (no model name, no prompt text):
from cerebe import Cerebe
cb = Cerebe(api_key=...)
res = cb.prompts.render(
    name="coaching/encouragement",
    inputs={"user_name": "Maria", "missed_days": 3},
)

Pricing relevance

Prompt Versioning is core API — included in every plan. LangSmith integration uses the customer's LangSmith account. Hosted dataset + eval runs are usage-based at cerebe.ai.

Open-source posture

YAML prompt schema + SDK are OSS. The hosted eval orchestration, canary deploy logic, auto-rollback decision policy, and quality-regression detection algorithms are hosted-only.

Get Started

Stop shipping prompts as silent changes.

YAML-managed. Semver. A/B canary. Auto-rollback on regression. The workflow that turns prompt changes into reviewable, measured artifacts. Self-signup at cerebe.ai.