Troubleshooting Guide

Common issues and solutions for Sibyl deployments.

Connection Issues

Cannot Connect to API

Symptoms:

Connection refused on port 3334
Frontend shows "Cannot connect to server"

Solutions:

Check if server is running:

bash

# Docker Compose
docker compose ps

# Kubernetes
kubectl get pods -n sibyl

Check server logs:

bash

# Docker Compose
docker compose logs backend

# Kubernetes
kubectl logs -n sibyl -l app.kubernetes.io/component=backend

Verify port binding:

bash

# Check what's listening
lsof -i :3334
netstat -tlnp | grep 3334

Check firewall rules (if applicable)

Database Connection Errors

PostgreSQL Connection Refused:

sqlalchemy.exc.OperationalError: connection refused

Solutions:

Verify PostgreSQL is running:

bash

docker compose ps postgres
kubectl get pods -n sibyl -l app=postgres

Check connection settings:

bash

# Environment variables
echo $SIBYL_POSTGRES_HOST
echo $SIBYL_POSTGRES_PORT  # Should be 5433 for local dev

Test direct connection:

bash

# Docker Compose
docker exec -it sibyl-postgres psql -U sibyl sibyl

# From host
psql -h localhost -p 5433 -U sibyl sibyl

FalkorDB Connection Refused:

redis.exceptions.ConnectionError: Connection refused

Solutions:

Verify FalkorDB is running:

bash

docker compose ps falkordb
kubectl get pods -n sibyl -l app=falkordb

Check connection settings:

bash

echo $SIBYL_FALKORDB_HOST
echo $SIBYL_FALKORDB_PORT  # Should be 6380 for local dev

Test direct connection:

bash

# Docker Compose
docker exec -it sibyl-falkordb redis-cli -a conventions PING

# From host
redis-cli -h localhost -p 6380 -a conventions PING

Graph Corruption

FalkorDB can occasionally become corrupted, especially after crashes or improper shutdowns.

Symptoms

Queries returning unexpected errors
"Graph does not exist" errors
Timeout errors on simple queries
Server crash when accessing graph

Recovery Options

Option 1: Delete and Recreate Graph

bash

# Connect to FalkorDB
redis-cli -h localhost -p 6380 -a conventions

# List all graphs
GRAPH.LIST

# Delete corrupted graph (USE CAUTION - DATA LOSS)
GRAPH.DELETE <org-uuid>

After deletion, the graph will be recreated on first use, but all data is lost.

Option 2: Restore from Backup

If you have a backup (created via /api/admin/backup):

bash

# Via API
curl -X POST https://sibyl.local/api/admin/restore \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d @backup.json

Option 3: Rebuild from PostgreSQL

If graph data is lost but PostgreSQL data exists:

Re-run document ingestion from crawl sources
Entities will be re-extracted from documents

Prevention

Enable FalkorDB persistence:

bash

# In compose or k8s, ensure data volume is mounted
volumes:
  - falkordb_data:/data

Graceful shutdowns:

bash

docker compose stop  # Not docker compose kill
kubectl delete pod <pod> --grace-period=30

Authentication Issues

JWT Token Invalid

Symptoms:

401 Unauthorized responses
"Invalid token" errors

Solutions:

Check JWT secret is set:

bash

# Should be non-empty
echo $SIBYL_JWT_SECRET

Verify token hasn't expired:
- Default access token TTL: 60 minutes
- Use refresh token to get new access token
Check clock synchronization:
- JWT validation requires synchronized clocks
- Server and client should have correct time

GitHub OAuth Failing

Symptoms:

"OAuth error" after GitHub redirect
Missing callback URL

Solutions:

Verify OAuth credentials:

bash

echo $SIBYL_GITHUB_CLIENT_ID
echo $SIBYL_GITHUB_CLIENT_SECRET

Check callback URL in GitHub:
- Must match SIBYL_PUBLIC_URL/api/auth/github/callback
- Example: https://sibyl.local/api/auth/github/callback
Verify public URL:
bash
```
echo $SIBYL_PUBLIC_URL
```

Performance Issues

Slow Queries

Symptoms:

API responses taking > 5 seconds
Timeouts on graph operations

Solutions:

Check graph size:

bash

curl -H "Authorization: Bearer $TOKEN" \
  https://sibyl.local/api/admin/stats

Limit query results:
- Use limit parameter on list endpoints
- Paginate large result sets

Check FalkorDB memory:

bash

redis-cli -h localhost -p 6380 -a conventions INFO memory

Increase Graphiti semaphore:

bash

SIBYL_GRAPHITI_SEMAPHORE_LIMIT=20  # Default is 10

High Memory Usage

Symptoms:

OOMKilled pods in Kubernetes
Container restarts

Solutions:

Increase resource limits:

yaml

backend:
  resources:
    limits:
      memory: 2Gi
    requests:
      memory: 512Mi

Check for memory leaks:
bash
```
kubectl top pods -n sibyl
```

Reduce connection pool sizes:

bash

SIBYL_POSTGRES_POOL_SIZE=5
SIBYL_POSTGRES_MAX_OVERFLOW=10

Worker Queue Backlog

Symptoms:

Jobs not completing
Crawl tasks stuck

Solutions:

Check worker status:

bash

kubectl logs -n sibyl -l app.kubernetes.io/component=worker -f

Scale workers:

bash

kubectl scale deployment sibyl-worker -n sibyl --replicas=3

Check Redis (job queue):

bash

redis-cli -h localhost -p 6380 -a conventions -n 1 KEYS "*"

Port Conflicts

FalkorDB Port Conflict

Error: Port 6380 already in use

This commonly happens if you have Redis running locally.

Solutions:

Stop conflicting service:

bash

# macOS
brew services stop redis

# Linux
sudo systemctl stop redis

Change port in compose:

yaml

falkordb:
  ports:
    - "6381:6379" # Use different host port

Update environment:
bash
```
SIBYL_FALKORDB_PORT=6381
```

PostgreSQL Port Conflict

Error: Port 5433 already in use

Solutions:

Find conflicting process:
bash
```
lsof -i :5433
```
Change port in compose:
yaml
```
postgres:
  ports:
    - "5434:5432"
```
Update environment:
bash
```
SIBYL_POSTGRES_PORT=5434
```

Kubernetes-Specific Issues

Pods Stuck in Pending

Solutions:

Check node resources:

bash

kubectl describe nodes
kubectl top nodes

Check resource requests:

bash

kubectl describe pod <pod-name> -n sibyl

Reduce resource requests:

yaml

resources:
  requests:
    cpu: 50m # Reduce from 100m
    memory: 128Mi # Reduce from 256Mi

Pods CrashLoopBackOff

Solutions:

Check logs:

bash

kubectl logs <pod-name> -n sibyl --previous

Check events:

bash

kubectl get events -n sibyl --sort-by='.lastTimestamp'

Common causes:
- Missing secrets
- Database not ready
- Port already bound

CNPG PostgreSQL Not Starting

Solutions:

Check CNPG operator:

bash

kubectl get pods -n cnpg-system
kubectl logs -n cnpg-system deployment/cnpg-cloudnative-pg

Check cluster status:

bash

kubectl get cluster -n sibyl
kubectl describe cluster sibyl-postgres -n sibyl

Check PVC:
bash
```
kubectl get pvc -n sibyl
```

Kong Gateway Issues

Solutions:

Check Kong operator:
bash
```
kubectl get pods -n kong-system
```

Check gateway:

bash

kubectl get gateway -n kong
kubectl describe gateway sibyl-gateway -n kong

Check dataplane:

bash

kubectl get pods -n kong
kubectl logs -n kong -l app=dataplane-sibyl-gateway

Check HTTPRoutes:

bash

kubectl get httproute -n sibyl
kubectl describe httproute sibyl-api -n sibyl

Tilt-Specific Issues

Tilt Stuck on Resource

Solutions:

Check Tilt logs:
- Click on stuck resource in Tilt UI
- Look for error messages
Trigger manual rebuild:
bash
```
tilt trigger <resource-name>
```

Full reset:

bash

tilt down
minikube delete
minikube start --cpus=4 --memory=8192
tilt up

Can't Access sibyl.local

Solutions:

Check /etc/hosts:

bash

grep sibyl.local /etc/hosts
# Should show: 127.0.0.1 sibyl.local

Check Caddy is running:
- Look for caddy-proxy in Tilt UI
Check Kong port-forward:
- Look for kong-port-forward in Tilt UI

Trust Caddy CA (macOS):

bash

sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain \
  ~/.local/share/caddy/pki/authorities/local/root.crt

Getting Help

If you're still stuck:

Check logs thoroughly:

bash

# All logs with timestamps
kubectl logs -n sibyl --all-containers --timestamps -l app.kubernetes.io/name=sibyl

Describe resources:

bash

kubectl describe pod <pod-name> -n sibyl
kubectl describe deployment <deploy-name> -n sibyl

Check events:

bash

kubectl get events -n sibyl --sort-by='.lastTimestamp' | tail -50

File an issue:
- Include logs and error messages
- Describe reproduction steps
- Include environment details (OS, K8s version, etc.)

Troubleshooting Guide ​

Connection Issues ​

Cannot Connect to API ​

Database Connection Errors ​

Graph Corruption ​

Symptoms ​

Recovery Options ​

Prevention ​

Authentication Issues ​

JWT Token Invalid ​

GitHub OAuth Failing ​

Performance Issues ​

Slow Queries ​

High Memory Usage ​

Worker Queue Backlog ​

Port Conflicts ​

FalkorDB Port Conflict ​

PostgreSQL Port Conflict ​

Kubernetes-Specific Issues ​

Pods Stuck in Pending ​

Pods CrashLoopBackOff ​

CNPG PostgreSQL Not Starting ​

Kong Gateway Issues ​

Tilt-Specific Issues ​

Tilt Stuck on Resource ​

Can't Access sibyl.local ​

Getting Help ​

Troubleshooting Guide

Connection Issues

Cannot Connect to API

Database Connection Errors

Graph Corruption

Symptoms

Recovery Options

Prevention

Authentication Issues

JWT Token Invalid

GitHub OAuth Failing

Performance Issues

Slow Queries

High Memory Usage

Worker Queue Backlog

Port Conflicts

FalkorDB Port Conflict

PostgreSQL Port Conflict

Kubernetes-Specific Issues

Pods Stuck in Pending

Pods CrashLoopBackOff

CNPG PostgreSQL Not Starting

Kong Gateway Issues

Tilt-Specific Issues

Tilt Stuck on Resource

Can't Access sibyl.local

Getting Help