Troubleshooting Guide
Common issues and solutions for Sibyl deployments.
Connection Issues
Cannot Connect to API
Symptoms:
- Connection refused on port 3334
- Frontend shows "Cannot connect to server"
Solutions:
Check if server is running:
bash# Docker Compose docker compose ps # Kubernetes kubectl get pods -n sibylCheck server logs:
bash# Docker Compose docker compose logs backend # Kubernetes kubectl logs -n sibyl -l app.kubernetes.io/component=backendVerify port binding:
bash# Check what's listening lsof -i :3334 netstat -tlnp | grep 3334Check firewall rules (if applicable)
Database Connection Errors
PostgreSQL Connection Refused:
sqlalchemy.exc.OperationalError: connection refusedSolutions:
Verify PostgreSQL is running:
bashdocker compose ps postgres kubectl get pods -n sibyl -l app=postgresCheck connection settings:
bash# Environment variables echo $SIBYL_POSTGRES_HOST echo $SIBYL_POSTGRES_PORT # Should be 5433 for local devTest direct connection:
bash# Docker Compose docker exec -it sibyl-postgres psql -U sibyl sibyl # From host psql -h localhost -p 5433 -U sibyl sibyl
FalkorDB Connection Refused:
redis.exceptions.ConnectionError: Connection refusedSolutions:
Verify FalkorDB is running:
bashdocker compose ps falkordb kubectl get pods -n sibyl -l app=falkordbCheck connection settings:
bashecho $SIBYL_FALKORDB_HOST echo $SIBYL_FALKORDB_PORT # Should be 6380 for local devTest direct connection:
bash# Docker Compose docker exec -it sibyl-falkordb redis-cli -a conventions PING # From host redis-cli -h localhost -p 6380 -a conventions PING
Graph Corruption
FalkorDB can occasionally become corrupted, especially after crashes or improper shutdowns.
Symptoms
- Queries returning unexpected errors
- "Graph does not exist" errors
- Timeout errors on simple queries
- Server crash when accessing graph
Recovery Options
Option 1: Delete and Recreate Graph
# Connect to FalkorDB
redis-cli -h localhost -p 6380 -a conventions
# List all graphs
GRAPH.LIST
# Delete corrupted graph (USE CAUTION - DATA LOSS)
GRAPH.DELETE <org-uuid>After deletion, the graph will be recreated on first use, but all data is lost.
Option 2: Restore from Backup
If you have a backup (created via /api/admin/backup):
# Via API
curl -X POST https://sibyl.local/api/admin/restore \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d @backup.jsonOption 3: Rebuild from PostgreSQL
If graph data is lost but PostgreSQL data exists:
- Re-run document ingestion from crawl sources
- Entities will be re-extracted from documents
Prevention
Enable FalkorDB persistence:
bash# In compose or k8s, ensure data volume is mounted volumes: - falkordb_data:/dataGraceful shutdowns:
bashdocker compose stop # Not docker compose kill kubectl delete pod <pod> --grace-period=30
Authentication Issues
JWT Token Invalid
Symptoms:
- 401 Unauthorized responses
- "Invalid token" errors
Solutions:
Check JWT secret is set:
bash# Should be non-empty echo $SIBYL_JWT_SECRETVerify token hasn't expired:
- Default access token TTL: 60 minutes
- Use refresh token to get new access token
Check clock synchronization:
- JWT validation requires synchronized clocks
- Server and client should have correct time
GitHub OAuth Failing
Symptoms:
- "OAuth error" after GitHub redirect
- Missing callback URL
Solutions:
Verify OAuth credentials:
bashecho $SIBYL_GITHUB_CLIENT_ID echo $SIBYL_GITHUB_CLIENT_SECRETCheck callback URL in GitHub:
- Must match
SIBYL_PUBLIC_URL/api/auth/github/callback - Example:
https://sibyl.local/api/auth/github/callback
- Must match
Verify public URL:
bashecho $SIBYL_PUBLIC_URL
Performance Issues
Slow Queries
Symptoms:
- API responses taking > 5 seconds
- Timeouts on graph operations
Solutions:
Check graph size:
bashcurl -H "Authorization: Bearer $TOKEN" \ https://sibyl.local/api/admin/statsLimit query results:
- Use
limitparameter on list endpoints - Paginate large result sets
- Use
Check FalkorDB memory:
bashredis-cli -h localhost -p 6380 -a conventions INFO memoryIncrease Graphiti semaphore:
bashSIBYL_GRAPHITI_SEMAPHORE_LIMIT=20 # Default is 10
High Memory Usage
Symptoms:
- OOMKilled pods in Kubernetes
- Container restarts
Solutions:
Increase resource limits:
yamlbackend: resources: limits: memory: 2Gi requests: memory: 512MiCheck for memory leaks:
bashkubectl top pods -n sibylReduce connection pool sizes:
bashSIBYL_POSTGRES_POOL_SIZE=5 SIBYL_POSTGRES_MAX_OVERFLOW=10
Worker Queue Backlog
Symptoms:
- Jobs not completing
- Crawl tasks stuck
Solutions:
Check worker status:
bashkubectl logs -n sibyl -l app.kubernetes.io/component=worker -fScale workers:
bashkubectl scale deployment sibyl-worker -n sibyl --replicas=3Check Redis (job queue):
bashredis-cli -h localhost -p 6380 -a conventions -n 1 KEYS "*"
Port Conflicts
FalkorDB Port Conflict
Error: Port 6380 already in use
This commonly happens if you have Redis running locally.
Solutions:
Stop conflicting service:
bash# macOS brew services stop redis # Linux sudo systemctl stop redisChange port in compose:
yamlfalkordb: ports: - "6381:6379" # Use different host portUpdate environment:
bashSIBYL_FALKORDB_PORT=6381
PostgreSQL Port Conflict
Error: Port 5433 already in use
Solutions:
Find conflicting process:
bashlsof -i :5433Change port in compose:
yamlpostgres: ports: - "5434:5432"Update environment:
bashSIBYL_POSTGRES_PORT=5434
Kubernetes-Specific Issues
Pods Stuck in Pending
Solutions:
Check node resources:
bashkubectl describe nodes kubectl top nodesCheck resource requests:
bashkubectl describe pod <pod-name> -n sibylReduce resource requests:
yamlresources: requests: cpu: 50m # Reduce from 100m memory: 128Mi # Reduce from 256Mi
Pods CrashLoopBackOff
Solutions:
Check logs:
bashkubectl logs <pod-name> -n sibyl --previousCheck events:
bashkubectl get events -n sibyl --sort-by='.lastTimestamp'Common causes:
- Missing secrets
- Database not ready
- Port already bound
CNPG PostgreSQL Not Starting
Solutions:
Check CNPG operator:
bashkubectl get pods -n cnpg-system kubectl logs -n cnpg-system deployment/cnpg-cloudnative-pgCheck cluster status:
bashkubectl get cluster -n sibyl kubectl describe cluster sibyl-postgres -n sibylCheck PVC:
bashkubectl get pvc -n sibyl
Kong Gateway Issues
Solutions:
Check Kong operator:
bashkubectl get pods -n kong-systemCheck gateway:
bashkubectl get gateway -n kong kubectl describe gateway sibyl-gateway -n kongCheck dataplane:
bashkubectl get pods -n kong kubectl logs -n kong -l app=dataplane-sibyl-gatewayCheck HTTPRoutes:
bashkubectl get httproute -n sibyl kubectl describe httproute sibyl-api -n sibyl
Tilt-Specific Issues
Tilt Stuck on Resource
Solutions:
Check Tilt logs:
- Click on stuck resource in Tilt UI
- Look for error messages
Trigger manual rebuild:
bashtilt trigger <resource-name>Full reset:
bashtilt down minikube delete minikube start --cpus=4 --memory=8192 tilt up
Can't Access sibyl.local
Solutions:
Check /etc/hosts:
bashgrep sibyl.local /etc/hosts # Should show: 127.0.0.1 sibyl.localCheck Caddy is running:
- Look for
caddy-proxyin Tilt UI
- Look for
Check Kong port-forward:
- Look for
kong-port-forwardin Tilt UI
- Look for
Trust Caddy CA (macOS):
bashsudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain \ ~/.local/share/caddy/pki/authorities/local/root.crt
Getting Help
If you're still stuck:
Check logs thoroughly:
bash# All logs with timestamps kubectl logs -n sibyl --all-containers --timestamps -l app.kubernetes.io/name=sibylDescribe resources:
bashkubectl describe pod <pod-name> -n sibyl kubectl describe deployment <deploy-name> -n sibylCheck events:
bashkubectl get events -n sibyl --sort-by='.lastTimestamp' | tail -50File an issue:
- Include logs and error messages
- Describe reproduction steps
- Include environment details (OS, K8s version, etc.)
