Semantic Search
Sibyl's search goes beyond keyword matching to understand the meaning of your queries. This guide explains how semantic search works and how to use it effectively.
How It Works
Vector Embeddings
When you add knowledge to Sibyl, the content is converted into a vector embedding - a high-dimensional numerical representation of meaning:
"OAuth refresh token implementation"
-> [0.023, -0.041, 0.089, ..., 0.012] (1536 dimensions)Similar concepts produce similar vectors, enabling meaning-based search.
Embedding Model
Sibyl uses OpenAI's embedding model (configurable):
# Default model
SIBYL_EMBEDDING_MODEL=text-embedding-3-small
# Higher quality (more expensive)
SIBYL_EMBEDDING_MODEL=text-embedding-3-largeHybrid Search
Search combines two techniques:
- Vector Search - Cosine similarity between query and entity embeddings
- BM25 Search - Traditional keyword scoring for exact matches
Results are merged using Reciprocal Rank Fusion (RRF):
RRF_score = sum(1 / (k + rank)) for each ranking systemThis ensures you get results that are either semantically similar OR keyword matches.
Using Search
Basic Search
# Search all entities
sibyl search "authentication patterns"
# The search finds related concepts even with different words
sibyl search "OAuth implementation" # Finds "authentication patterns"
sibyl search "login security" # Also matchesFiltering by Type
# Search only patterns
sibyl search "error handling" --type pattern
# Search tasks
sibyl search "OAuth" --type task
# Multiple types
sibyl search "database" --type pattern,episodeFiltering by Status
For tasks, filter by workflow status:
sibyl search "auth" --type task --status todo
sibyl search "" --type task --status doing # Empty query = list allFiltering by Project
Scope task searches to a project:
sibyl search "OAuth" --type task --project proj_abc123Other Filters
# By language
sibyl search "async patterns" --language python
# By category
sibyl search "debugging" --category database
# By assignee
sibyl search "OAuth" --assignee aliceSearch vs Explore
Sibyl offers two ways to find entities:
| Feature | search | explore |
|---|---|---|
| Purpose | Find by meaning | Browse structure |
| Input | Natural language query | Filters |
| Uses embeddings | Yes | No |
| Good for | "Find related to X" | "List all Y" |
When to Use Search
# Finding related knowledge
sibyl search "how to handle rate limiting"
# Discovering relevant patterns
sibyl search "retry logic best practices"
# Finding past solutions
sibyl search "Redis connection timeout"When to Use Entity List vs Explore
# List entities by type
sibyl entity list --type project
sibyl entity list --type task
# Explore graph relationships from a specific entity
sibyl explore related entity_xyz
sibyl explore traverse entity_xyz --depth 2Search in Code
MCP Tool
# Using the search MCP tool
result = await search(
query="OAuth implementation patterns",
types=["pattern", "episode"],
language="python",
limit=10
)
# Results include score
for item in result.results:
print(f"{item.name}: {item.score:.3f}")EntityManager
from sibyl_core.graph import EntityManager
manager = EntityManager(client, group_id=org_id)
# Semantic search
results = await manager.search(
query="authentication patterns",
entity_types=[EntityType.PATTERN, EntityType.EPISODE],
limit=20
)
# Returns (entity, score) tuples
for entity, score in results:
print(f"{entity.name}: {score:.3f}")Hybrid Search Module
For more control, use the hybrid search module directly:
from sibyl_core.retrieval import hybrid_search, HybridConfig
config = HybridConfig(
apply_temporal=True, # Boost recent results
temporal_decay_days=365, # Decay constant
graph_depth=2, # Relationship traversal
)
result = await hybrid_search(
query="OAuth patterns",
client=client,
entity_manager=manager,
entity_types=[EntityType.PATTERN],
limit=20,
config=config,
group_id=org_id,
)
# result.results contains (entity, score) tuplesTemporal Boosting
By default, search boosts recent results:
# Recent entities get higher scores
# Decay formula: score * exp(-days_old / decay_days)
from sibyl_core.retrieval import temporal_boost
boosted = temporal_boost(results, decay_days=365.0)This helps surface fresh knowledge while keeping older relevant results.
Search Tips
1. Use Natural Language
Search works best with natural language queries:
# GOOD - natural question
sibyl search "how to handle database connection timeouts"
# LESS GOOD - keyword style
sibyl search "database timeout handler"2. Be Specific
More context helps find better matches:
# GOOD - specific context
sibyl search "Python asyncio task cancellation handling"
# LESS GOOD - too broad
sibyl search "async tasks"3. Combine with Filters
Use type and project filters to narrow results:
# Find patterns about auth in a specific project
sibyl search "authentication" --type pattern --project proj_auth4. Empty Query for Listing
Use empty query with filters to list entities:
# List all patterns
sibyl search "" --type pattern
# List all todo tasks in project
sibyl search "" --type task --status todo --project proj_abcDocument Search
Sibyl can also search crawled documentation:
# Search documentation
sibyl search "Next.js middleware" --source-name "next-docs"
# Include documents in search
sibyl search "routing" --include-documentsDocument search uses the same hybrid approach, with pgvector for document chunks.
Understanding Results
Search results include:
| Field | Description |
|---|---|
id | Entity ID |
type | Entity type |
name | Entity name |
content | Content preview (truncated) |
score | Relevance score (0-1) |
source | Source file or URL |
result_origin | "graph" or "document" |
Get Full Content Search results show previews. Use sibyl entity show <id> to get full
content. :::
Performance Considerations
Result Limits
Always set reasonable limits:
sibyl search "query" --limit 20Index Efficiency
Vector search is fast, but filtering is applied post-search. For large graphs:
- Use type filters to narrow the search space
- Use project filters for task searches
- Consider temporal filters for recent knowledge
Caching
Embeddings are generated once when entities are created. Search queries generate a new embedding for the query.
Troubleshooting
No Results
- Check query isn't too specific
- Verify entity type filter
- Try broader search terms
- Check organization context
Irrelevant Results
- Add more context to query
- Use type filters
- Try different wording
Slow Search
- Reduce limit
- Add type filters
- Check FalkorDB connection
Next Steps
- Capturing Knowledge - Add searchable content
- Entity Types - Understand what to search for
- Task Management - Search for tasks
