Service 01

Memory Fabric

Working. Episodic. Long-term. Hybrid vector + graph retrieval. The memory layer that turns stateless LLMs into agents that remember.


The problem

Plain LLM calls forget everything. Agents need continuity.

Plain LLM APIs forget every turn. The first turn establishes context; the second re-establishes it. By turn ten, you're stuffing the entire conversation history into the prompt — burning tokens, hitting context limits, losing the early turns to the sliding window.

The agentic moves teams want — "remember what the user said last week," "build a profile over time," "retrieve the relevant past interaction" — are out of reach without a real memory layer. Most teams build one badly: a Pinecone index, a Redis cache, and a lot of glue.


How it works

Hybrid vector + graph + temporal store. One API. Three memory tiers.

Cerebe's Memory Fabric exposes three tiers — working (in-context, short TTL), episodic (per-conversation, persists across turns), and long-term (cross-session, multi-tenant). Each tier is backed by both Qdrant (vector) for semantic similarity and Neo4j (graph) for entity relationships and causal chains.

Retrieval is hybrid: vector similarity + graph traversal + temporal context, all scored and merged. A query like "what does Maria prefer for morning workouts" pulls semantic matches, traverses the Maria-related entity graph, and surfaces the right memory regardless of which tier it lives in. Sub-50ms p95 on production workloads.

  • Three tiers: working (in-context), episodic (per-conversation), long-term (cross-session, multi-tenant)
  • Vector backend: Qdrant for semantic search; embedding model is configurable
  • Graph backend: Neo4j for entity relationships, causal chains, temporal sequences
  • Hybrid retrieval: vector similarity + graph traversal + temporal context, scored and merged
  • COPPA-grade entity lifecycle: deletion propagates across stores; data residency by config
  • Sub-50ms p95 retrieval latency on production workloads — wired for the agentic critical path
cerebe-sdk — memory API python
# Cerebe Python SDK — memory fabric
from cerebe import Cerebe

cb = Cerebe(api_key=...)

# Working memory (in-context, current turn)
cb.memory.add("patient asked about hip pain", session="sess_123",
              type="working", ttl_minutes=30)

# Episodic memory (this conversation, persists across turns)
cb.memory.add("Maria reports left-hip stiffness on stairs",
              session="sess_123", type="episodic", importance=0.7)

# Long-term semantic memory (cross-session, multi-tenant)
cb.memory.add("Maria prefers morning workouts",
              session="sess_123", type="semantic", importance=0.9)

# Hybrid retrieval — vector + graph + temporal
results = cb.memory.retrieve(
    query="What time of day does Maria prefer to exercise?",
    session="sess_123",
    types=["semantic", "episodic"],
    limit=5,
)
# {
#   "memories": [
#     { "content": "Maria prefers morning workouts",
#       "type": "semantic", "score": 0.94, "source": "vector+graph" },
#     ...
#   ],
#   "trace_id": "otel_a89..."
# }

Pricing relevance

Memory Fabric is the core Cerebe API — included in every plan. Storage costs are usage-based on cerebe.ai; on-prem and VPC deployments are flat-rate. Full pricing at cerebe.ai.

Open-source posture

Memory schemas and SDK are OSS (Apache-2.0). The hybrid scoring algorithm + memory synthesis + temporal-decay heuristics are the hosted IP that lives at cerebe.ai.

Get Started

Stop re-prompting. Start remembering.

Persistent memory across sessions. Hybrid vector + graph retrieval. Sub-50ms p95. Self-signup and full docs at cerebe.ai.