Memory Fabric
Working. Episodic. Long-term. Hybrid vector + graph retrieval. The memory layer that turns stateless LLMs into agents that remember.
The problem
Plain LLM calls forget everything. Agents need continuity.
Plain LLM APIs forget every turn. The first turn establishes context; the second re-establishes it. By turn ten, you're stuffing the entire conversation history into the prompt — burning tokens, hitting context limits, losing the early turns to the sliding window.
The agentic moves teams want — "remember what the user said last week," "build a profile over time," "retrieve the relevant past interaction" — are out of reach without a real memory layer. Most teams build one badly: a Pinecone index, a Redis cache, and a lot of glue.
How it works
Hybrid vector + graph + temporal store. One API. Three memory tiers.
Cerebe's Memory Fabric exposes three tiers — working (in-context, short TTL), episodic (per-conversation, persists across turns), and long-term (cross-session, multi-tenant). Each tier is backed by both Qdrant (vector) for semantic similarity and Neo4j (graph) for entity relationships and causal chains.
Retrieval is hybrid: vector similarity + graph traversal + temporal context, all scored and merged. A query like "what does Maria prefer for morning workouts" pulls semantic matches, traverses the Maria-related entity graph, and surfaces the right memory regardless of which tier it lives in. Sub-50ms p95 on production workloads.
- Three tiers: working (in-context), episodic (per-conversation), long-term (cross-session, multi-tenant)
- Vector backend: Qdrant for semantic search; embedding model is configurable
- Graph backend: Neo4j for entity relationships, causal chains, temporal sequences
- Hybrid retrieval: vector similarity + graph traversal + temporal context, scored and merged
- COPPA-grade entity lifecycle: deletion propagates across stores; data residency by config
- Sub-50ms p95 retrieval latency on production workloads — wired for the agentic critical path
# Cerebe Python SDK — memory fabric
from cerebe import Cerebe
cb = Cerebe(api_key=...)
# Working memory (in-context, current turn)
cb.memory.add("patient asked about hip pain", session="sess_123",
type="working", ttl_minutes=30)
# Episodic memory (this conversation, persists across turns)
cb.memory.add("Maria reports left-hip stiffness on stairs",
session="sess_123", type="episodic", importance=0.7)
# Long-term semantic memory (cross-session, multi-tenant)
cb.memory.add("Maria prefers morning workouts",
session="sess_123", type="semantic", importance=0.9)
# Hybrid retrieval — vector + graph + temporal
results = cb.memory.retrieve(
query="What time of day does Maria prefer to exercise?",
session="sess_123",
types=["semantic", "episodic"],
limit=5,
)
# {
# "memories": [
# { "content": "Maria prefers morning workouts",
# "type": "semantic", "score": 0.94, "source": "vector+graph" },
# ...
# ],
# "trace_id": "otel_a89..."
# } Get Started
Stop re-prompting. Start remembering.
Persistent memory across sessions. Hybrid vector + graph retrieval. Sub-50ms p95. Self-signup and full docs at cerebe.ai.