Semantic Search

Sibyl's search goes beyond keyword matching to understand the meaning of your queries. This guide explains how semantic search works and how to use it effectively.

How It Works

Vector Embeddings

When you add knowledge to Sibyl, the content is converted into a vector embedding - a high-dimensional numerical representation of meaning:

"OAuth refresh token implementation"
    -> [0.023, -0.041, 0.089, ..., 0.012]  (1536 dimensions)

Similar concepts produce similar vectors, enabling meaning-based search.

Embedding Model

Sibyl uses OpenAI's embedding model (configurable):

bash

# Default model
SIBYL_EMBEDDING_MODEL=text-embedding-3-small

# Higher quality (more expensive)
SIBYL_EMBEDDING_MODEL=text-embedding-3-large

Hybrid Search

Search combines two techniques:

Vector Search - Cosine similarity between query and entity embeddings
BM25 Search - Traditional keyword scoring for exact matches

Results are merged using Reciprocal Rank Fusion (RRF):

RRF_score = sum(1 / (k + rank)) for each ranking system

This ensures you get results that are either semantically similar OR keyword matches.

Using Search

Basic Search

bash

# Search all entities
sibyl search "authentication patterns"

# The search finds related concepts even with different words
sibyl search "OAuth implementation"  # Finds "authentication patterns"
sibyl search "login security"         # Also matches

Filtering by Type

bash

# Search only patterns
sibyl search "error handling" --type pattern

# Search tasks
sibyl search "OAuth" --type task

# Multiple types
sibyl search "database" --type pattern,episode

Filtering by Status

For tasks, filter by workflow status:

bash

sibyl search "auth" --type task --status todo
sibyl search "" --type task --status doing  # Empty query = list all

Filtering by Project

Scope task searches to a project:

bash

sibyl search "OAuth" --type task --project proj_abc123

Other Filters

bash

# By language
sibyl search "async patterns" --language python

# By category
sibyl search "debugging" --category database

# By assignee
sibyl search "OAuth" --assignee alice

Search vs Explore

Sibyl offers two ways to find entities:

Feature	`search`	`explore`
Purpose	Find by meaning	Browse structure
Input	Natural language query	Filters
Uses embeddings	Yes	No
Good for	"Find related to X"	"List all Y"

When to Use Search

bash

# Finding related knowledge
sibyl search "how to handle rate limiting"

# Discovering relevant patterns
sibyl search "retry logic best practices"

# Finding past solutions
sibyl search "Redis connection timeout"

When to Use Entity List vs Explore

bash

# List entities by type
sibyl entity list --type project
sibyl entity list --type task

# Explore graph relationships from a specific entity
sibyl explore related entity_xyz
sibyl explore traverse entity_xyz --depth 2

Search in Code

MCP Tool

python

# Using the search MCP tool
result = await search(
    query="OAuth implementation patterns",
    types=["pattern", "episode"],
    language="python",
    limit=10
)

# Results include score
for item in result.results:
    print(f"{item.name}: {item.score:.3f}")

EntityManager

python

from sibyl_core.graph import EntityManager

manager = EntityManager(client, group_id=org_id)

# Semantic search
results = await manager.search(
    query="authentication patterns",
    entity_types=[EntityType.PATTERN, EntityType.EPISODE],
    limit=20
)

# Returns (entity, score) tuples
for entity, score in results:
    print(f"{entity.name}: {score:.3f}")

Hybrid Search Module

For more control, use the hybrid search module directly:

python

from sibyl_core.retrieval import hybrid_search, HybridConfig

config = HybridConfig(
    apply_temporal=True,       # Boost recent results
    temporal_decay_days=365,   # Decay constant
    graph_depth=2,             # Relationship traversal
)

result = await hybrid_search(
    query="OAuth patterns",
    client=client,
    entity_manager=manager,
    entity_types=[EntityType.PATTERN],
    limit=20,
    config=config,
    group_id=org_id,
)

# result.results contains (entity, score) tuples

Temporal Boosting

By default, search boosts recent results:

python

# Recent entities get higher scores
# Decay formula: score * exp(-days_old / decay_days)

from sibyl_core.retrieval import temporal_boost

boosted = temporal_boost(results, decay_days=365.0)

This helps surface fresh knowledge while keeping older relevant results.

Search Tips

1. Use Natural Language

Search works best with natural language queries:

bash

# GOOD - natural question
sibyl search "how to handle database connection timeouts"

# LESS GOOD - keyword style
sibyl search "database timeout handler"

2. Be Specific

More context helps find better matches:

bash

# GOOD - specific context
sibyl search "Python asyncio task cancellation handling"

# LESS GOOD - too broad
sibyl search "async tasks"

3. Combine with Filters

Use type and project filters to narrow results:

bash

# Find patterns about auth in a specific project
sibyl search "authentication" --type pattern --project proj_auth

4. Empty Query for Listing

Use empty query with filters to list entities:

bash

# List all patterns
sibyl search "" --type pattern

# List all todo tasks in project
sibyl search "" --type task --status todo --project proj_abc

Document Search

Sibyl can also search crawled documentation:

bash

# Search documentation
sibyl search "Next.js middleware" --source-name "next-docs"

# Include documents in search
sibyl search "routing" --include-documents

Document search uses the same hybrid approach, with pgvector for document chunks.

Understanding Results

Search results include:

Field	Description
`id`	Entity ID
`type`	Entity type
`name`	Entity name
`content`	Content preview (truncated)
`score`	Relevance score (0-1)
`source`	Source file or URL
`result_origin`	"graph" or "document"

Get Full Content Search results show previews. Use sibyl entity show <id> to get full

content. :::

Performance Considerations

Result Limits

Always set reasonable limits:

bash

sibyl search "query" --limit 20

Index Efficiency

Vector search is fast, but filtering is applied post-search. For large graphs:

Use type filters to narrow the search space
Use project filters for task searches
Consider temporal filters for recent knowledge

Caching

Embeddings are generated once when entities are created. Search queries generate a new embedding for the query.

Troubleshooting

No Results

Check query isn't too specific
Verify entity type filter
Try broader search terms
Check organization context

Irrelevant Results

Add more context to query
Use type filters
Try different wording

Slow Search

Reduce limit
Add type filters
Check FalkorDB connection

Next Steps

Capturing Knowledge - Add searchable content
Entity Types - Understand what to search for
Task Management - Search for tasks

Semantic Search ​

How It Works ​

Vector Embeddings ​

Embedding Model ​

Hybrid Search ​

Using Search ​

Basic Search ​

Filtering by Type ​

Filtering by Status ​

Filtering by Project ​

Other Filters ​

Search vs Explore ​

When to Use Search ​

When to Use Entity List vs Explore ​

Search in Code ​

MCP Tool ​

EntityManager ​

Hybrid Search Module ​

Temporal Boosting ​

Search Tips ​

1. Use Natural Language ​

2. Be Specific ​

3. Combine with Filters ​

4. Empty Query for Listing ​

Document Search ​

Understanding Results ​

Performance Considerations ​

Result Limits ​

Index Efficiency ​

Caching ​

Troubleshooting ​

No Results ​

Irrelevant Results ​

Slow Search ​

Next Steps ​

Semantic Search

How It Works

Vector Embeddings

Embedding Model

Hybrid Search

Using Search

Basic Search

Filtering by Type

Filtering by Status

Filtering by Project

Other Filters

Search vs Explore

When to Use Search

When to Use Entity List vs Explore

Search in Code

MCP Tool

EntityManager

Hybrid Search Module

Temporal Boosting

Search Tips

1. Use Natural Language

2. Be Specific

3. Combine with Filters

4. Empty Query for Listing

Document Search

Understanding Results

Performance Considerations

Result Limits

Index Efficiency

Caching

Troubleshooting

No Results

Irrelevant Results

Slow Search

Next Steps