Skip to content

Knowledge Graph

Sibyl stores knowledge in a graph database, enabling rich relationships between entities and semantic search. This guide explains how the graph works.

Architecture Overview

Sibyl uses a two-database architecture:

DatabasePurposeTechnology
Graph DBEntities, relationships, embeddingsFalkorDB
Relational DBUsers, sessions, API keys, crawled docsPostgreSQL

FalkorDB

FalkorDB is a Redis-compatible graph database that Sibyl uses via Graphiti, a Graph RAG framework. It provides:

  • Cypher queries - Graph query language
  • Vector search - Native embedding support
  • High performance - Redis-speed operations

Graphiti Integration

Graphiti is the Graph RAG layer that powers Sibyl's knowledge operations:

python
# Sibyl uses Graphiti's node APIs
from graphiti_core.nodes import EntityNode, EpisodicNode
from graphiti_core.search.search_config_recipes import NODE_HYBRID_SEARCH_RRF

Node Types

Graphiti creates two fundamental node types in the graph:

Episodic Nodes

Created by add_episode() - temporal knowledge snapshots:

python
# When you add knowledge via the CLI or MCP
sibyl add "Redis insight" "Connection pool must be >= concurrent requests"
# Creates an EpisodicNode with entity_type property

Entity Nodes

Extracted entities or directly inserted nodes:

python
# Created via EntityNode.save() for structured data
# Tasks, projects, patterns with specific schemas

Query Both Node Types When querying the graph, you must handle both node types:

cypher
MATCH (n)
WHERE (n:Episodic OR n:Entity) AND n.entity_type = $type
RETURN n

Entity Types

Sibyl supports many entity types (see Entity Types for full details):

TypeDescription
episodeTemporal learnings, discoveries
patternReusable coding patterns
ruleSacred constraints, invariants
taskWork items with workflow
projectContainer for tasks/epics
epicFeature-level grouping
documentCrawled content
sourceDocumentation sources

Relationships

Entities connect through typed relationships:

Knowledge Relationships

TypeUsage
APPLIES_TOPattern applies to topic
REQUIRESA requires B
CONFLICTS_WITHMutual exclusion
SUPERSEDESA replaces B
RELATED_TOGeneric relationship
ENABLESA enables B
BREAKSA breaks B

Task Relationships

TypeUsage
BELONGS_TOTask -> Project, Epic -> Project
DEPENDS_ONTask -> Task (blocking)
BLOCKSTask -> Task (inverse)
ASSIGNED_TOTask -> Person
REFERENCESTask -> Pattern/Rule

Document Relationships

TypeUsage
CRAWLED_FROMDocument -> Source
CHILD_OFDocument hierarchy
MENTIONSDocument -> Entity

Multi-Tenancy

Each organization gets its own isolated graph:

python
# Graph named by organization UUID
graph_name = str(org.id)  # e.g., "550e8400-e29b-41d4-a716-446655440000"

# All operations require org context
manager = EntityManager(client, group_id=str(org.id))

Always Scope by Organization Never query without org scope - it will hit the wrong graph

or break isolation. :::

Write Concurrency

FalkorDB requires serialized writes to prevent corruption. Sibyl uses a semaphore:

python
async with client.write_lock:
    await client.execute_write_org(org_id, query, **params)

The EntityManager handles this automatically for all write operations.

Search combines multiple techniques:

Embeddings generated by OpenAI's embedding model enable semantic similarity:

python
# Search uses NODE_HYBRID_SEARCH_RRF recipe
results = await client.search_(
    query=query,
    config=NODE_HYBRID_SEARCH_RRF,
    group_ids=[org_id],
)

Keyword-based scoring for exact matches:

python
# RediSearch provides BM25 scoring
# Combined with vector search via RRF fusion

Reciprocal Rank Fusion (RRF)

Combines vector and keyword results:

RRF_score = sum(1 / (k + rank_i)) for each ranking

Entity Storage

Entities store metadata as JSON in the metadata property:

python
# Core properties stored directly
n.uuid          # Entity ID
n.name          # Display name
n.entity_type   # Type enum value
n.content       # Full content
n.description   # Summary

# Extended properties in metadata JSON
n.metadata = {
    "status": "doing",
    "priority": "high",
    "project_id": "proj_abc",
    "tags": ["backend", "auth"],
    ...
}

Graph Creation Paths

LLM-Powered (Slower, Richer)

python
# Uses Graphiti's entity extraction
await manager.create(entity)
  • Analyzes content with LLM
  • Extracts additional entities
  • Creates relationships automatically
  • Best for unstructured knowledge

Direct Insertion (Faster)

python
# Bypasses LLM, uses EntityNode directly
await manager.create_direct(entity)
  • Creates node immediately
  • Generates embedding inline
  • Best for structured entities (tasks, projects)
  • Used for bulk operations

Querying the Graph

Using EntityManager

python
from sibyl_core.graph import GraphClient, EntityManager

client = await GraphClient.create()
manager = EntityManager(client, group_id=str(org_id))

# Search
results = await manager.search("OAuth patterns", limit=10)

# Get by ID
entity = await manager.get("entity_abc")

# List by type
tasks = await manager.list_by_type(
    EntityType.TASK,
    status="todo",
    project_id="proj_123"
)

Using RelationshipManager

python
from sibyl_core.graph import RelationshipManager

rel_manager = RelationshipManager(client, group_id=str(org_id))

# Get related entities
related = await rel_manager.get_related_entities(
    entity_id="pattern_abc",
    relationship_types=[RelationshipType.APPLIES_TO],
    max_depth=2
)

Direct Cypher Queries

For complex queries, use Cypher directly:

python
result = await driver.execute_query(
    """
    MATCH (t:Entity)-[:BELONGS_TO]->(p:Entity)
    WHERE t.entity_type = 'task'
      AND p.uuid = $project_id
      AND t.status = 'doing'
    RETURN t.uuid, t.name, t.status
    """,
    project_id="proj_abc"
)

Best Practices

1. Always Use Org Context

python
# WRONG
manager = EntityManager(client, group_id="")

# RIGHT
manager = EntityManager(client, group_id=str(org.id))

2. Handle Both Node Labels

cypher
-- WRONG (misses Episodic nodes)
MATCH (n:Entity) WHERE n.entity_type = 'pattern'

-- RIGHT
MATCH (n) WHERE (n:Episodic OR n:Entity) AND n.entity_type = 'pattern'

3. Use Write Lock for Mutations

python
# EntityManager handles this, but for direct queries:
async with client.write_lock:
    await driver.execute_query("CREATE ...")

4. Filter Early in Queries

cypher
-- WRONG (fetches all, filters in Python)
MATCH (n) RETURN n

-- RIGHT (filters in DB)
MATCH (n)
WHERE n.entity_type = $type AND n.group_id = $org_id
RETURN n
LIMIT 100

Troubleshooting

Graph Corruption

If you encounter corruption errors:

bash
# Connect to FalkorDB
redis-cli -p 6380

# Delete the corrupted graph
GRAPH.DELETE <org-uuid>

Slow Queries

  1. Add indexes for frequently queried properties
  2. Limit result sets
  3. Use specific node labels when possible

Missing Results

  1. Check both Episodic and Entity labels
  2. Verify org_id matches
  3. Check if entity_type filter is correct

Next Steps

Released under the MIT License.