Prompt Versioning
YAML-managed prompts. Semver. A/B canary evaluation. Auto-rollback on quality regression. The prompt-as-code workflow that catches drift before it ships.
The problem
Prompt changes are silent. No version control, no eval, no rollback.
Most teams treat prompts as uncommitted source code: a senior engineer edits a Python string, ships the change, hopes for the best. There's no diff to review, no evaluation against historical data, no rollback when quality regresses. The team learns about the regression from user complaints two weeks later.
For agentic apps where prompt quality is the product, this is a fundamental governance gap. Prompts deserve the same version control and quality gating that application code gets.
How it works
YAML prompts + LangSmith eval + canary variants + auto-rollback.
Prompts live as YAML files with semver, declaring their inputs, system + user templates, and their eval suite. Each version is canary-deployed with weighted traffic split (e.g., 20% new variant, 80% prior baseline).
LangSmith runs the eval suite on every variant against a held-out dataset.
Metrics — tone, factuality, brevity, custom rubrics — are computed continuously.
If any metric regresses 5%+ against the prior baseline, auto-rollback
fires immediately and the prior version takes 100% of traffic. The app code never
references prompt text directly — only cerebe.prompts.render(name=..., inputs=...)
by name + version.
- Prompts in YAML files with semver — versionable, diff-able, code-reviewable
- Each prompt declares its eval suite + variants with weighted traffic split
- LangSmith integration runs evals on every variant against a held-out dataset
- Auto-rollback fires when a metric regresses 5%+ against the prior baseline
- Canary deploys: 20% to the new variant, 80% to baseline, until eval confidence is high
- App code references prompt by name + version — no prompt text in the codebase
# prompts/coaching/encouragement.yaml — versioned, semver, eval-gated
name: coaching/encouragement
version: 2.1.0
description: Empathetic encouragement after a missed-session signal.
inputs:
- name: user_name
type: string
- name: missed_days
type: integer
system: |
You are a supportive movement coach. Tone: warm, non-judgmental, brief.
Reference the user's stated preferences from memory; do not invent facts.
user: |
{{ user_name }} missed their last {{ missed_days }} scheduled sessions.
Encourage re-engagement.
evaluation:
suite: coaching-tone
variants:
- id: v2.1.0
weight: 0.80
- id: v2.0.0
weight: 0.20 # canary; rollback target
metrics:
- tone-empathy >= 0.85
- factuality >= 0.95
- brevity <= 80_tokens
auto_rollback: true # if any metric regresses 5%+, revert to prior version
# Usage in app code (no model name, no prompt text):
from cerebe import Cerebe
cb = Cerebe(api_key=...)
res = cb.prompts.render(
name="coaching/encouragement",
inputs={"user_name": "Maria", "missed_days": 3},
) Get Started
Stop shipping prompts as silent changes.
YAML-managed. Semver. A/B canary. Auto-rollback on regression. The workflow that turns prompt changes into reviewable, measured artifacts. Self-signup at cerebe.ai.