Sibyl Native LLM Substrate Plan
Status: complete v4, verified for Wave 1 handoff. Author: Nova (Claude Opus 4.7) Date: 2026-05-15 Target milestone: v0.10 Roadmap: SIBYL_1_0_ROADMAP.md
1.0 planning note: this substrate is required for automatic reflection, synthesis, and memory review. It should also replace remaining Graphiti-era extraction/model-provider seams as Graphiti is deleted from the supported runtime.
1. Goal
Stand up a native PydanticAI-based LLM substrate that owns every non-Graphiti structured-extraction and text-generation call site in Sibyl, offers three first-class providers (Anthropic, Google Gemini, OpenAI), and exposes operator-grade configuration through Settings API + web UI. The substrate keeps a strict package boundary so sibyl_core.ai stays pure (env + defaults), while apps/api/src/sibyl/ai/llm plugs in the DB-backed config source.
This is the surface that will outlive the Graphiti exit and host the v1.0 automatic reflection, synthesis, and memory-review features named in the roadmap.
2. Success criteria
sibyl_core.aidefinesModelRegistryand shared AI primitives, whilesibyl_core.ai.llmdefinesExtractor[T],Generator,LLMConfig, and anLLMConfigSourceprotocol. Core has no dependency onapps/apior any DB.apps/api/src/sibyl/ai/llmprovidesDBSettingsConfigSourcethat implementsLLMConfigSourceviaSettingsServiceand is the resolver used by API + worker processes.- First-class providers: Anthropic Haiku 4.5, Gemini 3 Flash, GPT-5.4 mini. Versioned model registry with capability flags,
last_verified_at, deprecation date, and cost source. - Crawler entity extraction (
apps/api/src/sibyl/crawler/graph_integration.py) and the synthesis generator (apps/api/src/sibyl/generator/llm.py) run through the substrate. No directanthropic.AsyncAnthropic,openai.AsyncOpenAI, orgoogle.genaiinstantiation in those call sites. - Settings API exposes provider, model, and key configuration per surface as instance-wide settings guarded by the existing
require_settings_adminowner-or-admin policy. Env overrides DB and is surfaced aslocked_by_envin API responses. apps/web/src/app/(main)/settings/admin/ai/page.tsxrenders an LLM section with per-surface model selection, env-lock indicators, an explicit instance-wide banner, and a per-row "Test" affordance.- Provider key validation hits each provider with a current, non-retired model via the substrate's
check_provider_key. Setup and Settings routes drop retired Haiku 3 snapshot references. moon run :checkis green across core, api, cli, and web.- Crawler smoke harness gates on schema success rate, required-field coverage, and type-distribution drift versus the current Haiku 4.5 baseline (not just entity count).
3. Non-goals
- Replacing or refactoring code under
packages/python/sibyl-core/src/sibyl_core/graph/. The Graphitillm_provider/llm_modelconfig incore/config.pyandapps/api/src/sibyl/config.pyremains untouched. It dies on its own schedule per the Graphiti exit inventory. - Replacing the embedding pipeline. Embedding provider config and
gemini_embedder.pystay as-is. - Migrating
tools/reflect.py. Reflect is heuristic today (regex-based, no LLM SDK import). Adding LLM-driven reflection is a separate scope item that consumes this substrate; it is not part of this plan. See §9.1. - Migrating the user-prompt-submit hook. Its stdlib-HTTP design exists to avoid SDK and package coupling so the hook stays fast on cold starts. Migration would require the hook to call an API endpoint that uses the substrate server-side; defer to v0.11 (§9.1).
- Per-organization LLM configuration. v0.10 ships instance-wide config matching the existing
system_settingsmodel and settings-admin policy. Per-org overrides are tracked as a v0.11+ follow-up. - Building a generic agent system. Tool use lands when a real consumer asks for it;
Generatorexposes streamed text only. - Adding LiteLLM, OpenRouter, or other proxy layers.
4. Current state inventory
Non-Graphiti LLM-adjacent surfaces:
| Surface | File | Behavior | Model | In scope? |
|---|---|---|---|---|
| Crawler extract | apps/api/src/sibyl/crawler/graph_integration.py:189 | Anthropic SDK, JSON-from-prose parsing, retry-by-string | hardcoded claude-haiku-4-5 | Yes (v0.10) |
| Synthesis | apps/api/src/sibyl/generator/llm.py:45 | Anthropic SDK direct, blocking client | hardcoded | Yes (v0.10) |
| Reflect | packages/python/sibyl-core/src/sibyl_core/tools/reflect.py | Regex/heuristic; no LLM SDK import today | n/a | No (future consumer) |
| Prompt hook | apps/cli/src/sibyl_cli/data/hooks/user-prompt-submit.py:34 | Stdlib HTTPS POST to Anthropic; intentional no-SDK | claude-haiku-4-5-20251001 | No (defer) |
Settings/setup validation:
apps/api/src/sibyl/api/routes/setup.py:121andapps/api/src/sibyl/api/routes/settings.py:135both probe the retired Haiku 3 snapshot for Anthropic key validation. Replace them.
Web UI:
apps/web/src/app/(main)/settings/admin/ai/page.tsxexists with embedding provider/model selection and key entry for OpenAI, Anthropic, and Gemini. We extend it with LLM config sections.
Process surfaces that need to resolve LLM config:
- FastAPI app (
apps/api/src/sibyl/main.py,apps/api/src/sibyl/api/app.py) — multiple workers, multiple event loops over the lifetime of a process. - arq worker (
apps/api/src/sibyl/jobs/worker.py) — separate process, separate event loop. - CLI/MCP — typically env-only; should not import
SettingsService.
5. Target architecture
5.1 Package layout
The boundary fix: sibyl_core.ai is pure (no apps/api import, no DB import). It exposes a LLMConfigSource protocol and an env-only default implementation. apps/api ships the DB-backed implementation and wires it at process startup.
The umbrella package is sibyl_core.ai/ — v0.10 ships the ai.llm/ submodule and reserves ai.embeddings/ for v0.11+ when embedding providers join the substrate. Shared primitives (provider key resolution, validation, client cache, registry, errors, observability) live directly under ai/ so the embedding submodule can reuse them without a future rename.
packages/python/sibyl-core/src/sibyl_core/ai/
├── __init__.py # Public surface re-exports from llm/
├── registry.py # ModelRegistry, ModelEntry (with kind: "llm" | "embedding")
├── providers.py # provider-keyed factories shared by llm and (future) embeddings
├── clients.py # scoped-cache of (loop_id, fingerprint) → Agent / Embedder
├── validation.py # check_provider_key (kind-agnostic), check_model_availability
├── observability.py # Token tracker, structlog spans (no Logfire dep)
├── errors.py # AIError + classified subclasses (LLM and embedding share these)
├── _testing.py # TestModel-based mocks for both kinds
└── llm/
├── __init__.py
├── config.py # LLMConfig, LLMSurface, LLMConfigSource protocol, EnvConfigSource
├── extractor.py # Extractor[T: BaseModel] — single-shot structured extraction
└── generator.py # Generator — streamed text generation
# ai/embeddings/ reserved for v0.11; not created in v0.10.
apps/api/src/sibyl/ai/
├── __init__.py
├── llm/
│ ├── __init__.py
│ ├── config_source.py # DBSettingsConfigSource: implements LLMConfigSource via SettingsService
│ ├── routes.py # LLM settings endpoints (mounted under /api/settings/ai/llm)
│ └── service.py # cache invalidation, worker startup reload, settings-admin guard
# apps/api/src/sibyl/ai/embeddings/ reserved for v0.11.Reasoning: core stays usable in any process that has env vars. API processes call set_config_source(DBSettingsConfigSource(...)) once at startup. Worker processes do the same after load_runtime_settings_from_db(). The ai/ umbrella is structural-only in v0.10 — there is no abstract AIConfigSource base trying to unify LLM and embedding configs at the protocol level (their fields differ — temperature/max_tokens vs dimensions). What we share is the surrounding plumbing: keys, registry, cache, validation, error taxonomy, settings page layout.
5.2 Key types
# sibyl_core/ai/llm/config.py
class LLMSurface(StrEnum):
DEFAULT = "default"
CRAWLER = "crawler"
SYNTHESIS = "synthesis"
# Future consumers reserved here; added when their feature lands:
# REFLECT = "reflect" # v0.11 LLM-assisted reflection
# PROMPT_HOOK = "prompt_hook" # v0.11 hook-via-api migration
class LLMConfig(BaseModel):
"""Raw shape passed to provider factories. Flat values, no metadata."""
provider: Literal["anthropic", "gemini", "openai"]
model: str # alias or snapshot from the registry; custom allowed with warning
temperature: float = 0.0
max_tokens: int | None = None
timeout_seconds: float = 60.0
api_key: SecretStr | None = None # resolved at call time; never serialized to API responses
class ConfigField[T](BaseModel):
"""Per-field resolution metadata. Different fields can have different sources."""
value: T
source: Literal["env", "db", "default"]
locked_by_env: bool = False
env_var: str | None = None # populated when source = "env"
class ResolvedLLMConfig(BaseModel):
"""Output of LLMConfigSource.resolve. Per-field metadata so the API + UI
can render source badges and reject env-locked writes correctly."""
surface: LLMSurface
provider: ConfigField[Literal["anthropic", "gemini", "openai"]]
model: ConfigField[str]
temperature: ConfigField[float]
max_tokens: ConfigField[int | None]
timeout_seconds: ConfigField[float]
api_key: ConfigField[SecretStr] # never serialized in API responses; UI sees only `source` + presence
cached_at: datetime | None = None # populated by DB-backed sources; used for worker TTL refresh
def to_llm_config(self) -> LLMConfig: ...
class LLMConfigSource(Protocol):
async def resolve(self, surface: LLMSurface) -> ResolvedLLMConfig: ...
async def invalidate(self, surface: LLMSurface | None = None) -> None: ...
class EnvConfigSource:
"""Pure-env resolver; the default when no DB-backed source is installed.
Reads from os.environ at resolution time; never mutates it."""
async def resolve(self, surface: LLMSurface) -> ResolvedLLMConfig: ...
# sibyl_core/ai/llm/extractor.py
class Extractor[T: BaseModel]:
def __init__(
self,
schema: type[T],
*,
surface: LLMSurface = LLMSurface.DEFAULT,
system_prompt: str,
model_override: str | None = None,
) -> None: ...
async def extract(self, content: str, *, retries: int = 2) -> T: ...
async def extract_many(
self,
chunks: Sequence[str],
*,
max_concurrency: int = 4,
) -> list[T | LLMError]: ...
# sibyl_core/ai/llm/generator.py
class Generator:
async def generate(self, prompt: str) -> str: ...
async def stream(self, prompt: str) -> AsyncIterator[str]: ...
# apps/api/src/sibyl/ai/llm/config_source.py
class DBSettingsConfigSource:
"""LLMConfigSource backed by SettingsService. Env wins over DB (see §5.3)."""
def __init__(self, get_settings_service: Callable[[], SettingsService]) -> None: ...5.3 Configuration hierarchy
Contract: env overrides DB. Deployment-time env beats UI-set values so operators can hot-fix without web access. The UI exposes an env_lock indicator and the API rejects writes to env-locked surfaces with 409 LOCKED_BY_ENV (the lock state is per-key, e.g. model may be env-locked while temperature is not).
Resolution order (most-specific wins):
- Explicit
model_overrideon the call site (rarely used; tests/scripts). - Env override:
SIBYL_LLM_<SURFACE>_MODEL,SIBYL_LLM_<SURFACE>_PROVIDER,SIBYL_LLM_<SURFACE>_TEMPERATURE,SIBYL_LLM_<SURFACE>_MAX_TOKENS. - Database setting:
llm.<surface>.<field>written via Settings API. - Global env:
SIBYL_LLM_MODEL,SIBYL_LLM_PROVIDER, … - Global database setting:
llm.default.<field>. - Compile-time default from
registry.py:("anthropic", "claude-haiku-4-5").
API keys follow the same precedence using existing key vars (SIBYL_ANTHROPIC_API_KEY, ANTHROPIC_API_KEY, …) and are resolved through LLMConfigSource.resolve(), not through os.environ. The legacy load_api_keys_from_db() function copies DB-stored keys into os.environ for non-substrate callers (e.g. anthropic.AsyncAnthropic() direct construction) and remains in place for those code paths during the Graphiti exit window. The substrate does not depend on it and reads the key from the resolver. This is what makes env-over-DB internally consistent: the substrate sees env-first because the resolver checks env first, not because load_api_keys_from_db ran. The UI shows env/db/default source per field and disables fields where env wins.
5.4 Model registry
registry.py ships a versioned registry under sibyl_core.ai.registry. Each entry is a record (not a dict literal). The registry is the source of truth for what shows up in dropdowns and what validation accepts without an explicit "custom model" opt-in. v0.10 ships only kind="llm" entries; embedding entries land in v0.11 using the same shape.
class ModelKind(StrEnum):
LLM = "llm"
EMBEDDING = "embedding"
class ModelCapability(StrEnum):
STRUCTURED_OUTPUT = "structured_output"
STREAMING = "streaming"
TOOL_USE = "tool_use"
THINKING = "thinking"
class ModelEntry(BaseModel):
alias: str # e.g. "claude-haiku-4-5"
snapshot: str # e.g. "claude-haiku-4-5-20251001"
kind: ModelKind # "llm" or "embedding" — v0.10 only ships llm entries
provider: Literal["anthropic", "gemini", "openai", "cohere", "voyageai", "bedrock"]
provider_model_id: str # exact string passed to PydanticAI
pydantic_ai_model_class: str # e.g. "AnthropicModel" or "OpenAIResponsesModel"
use_cases: list[str] # ["extraction", "synthesis", "bulk", "semantic-search", ...]
capabilities: set[ModelCapability] # empty for embedding entries
max_output_tokens: int | None # None for embedding entries
embedding_dimensions: int | None # set for embedding entries; None for llm
default_temperature: float | None # None for embedding entries
input_cost_per_mtok_usd: float
output_cost_per_mtok_usd: float | None # None for embedding entries
cost_source_url: str
last_verified_at: datetime
deprecated_after: datetime | NoneLookup helpers accept a kind filter so the LLM UI only sees LLM entries and the future embedding UI only sees embedding entries:
def llm_entries() -> list[ModelEntry]: ... # filter kind=LLM
def embedding_entries() -> list[ModelEntry]: ... # filter kind=EMBEDDING (empty in v0.10)
def recommended_for(use_case: str, kind: ModelKind) -> ModelEntry: ...Initial entries (all last_verified_at = 2026-05-15):
| Provider | Alias | Provider model ID | Use cases | $/M in | $/M out |
|---|---|---|---|---|---|
| Anthropic | claude-haiku-4-5 | claude-haiku-4-5-20251001 | extraction, default | 1.00 | 5.00 |
| Anthropic | claude-sonnet-4-6 | claude-sonnet-4-6 | synthesis (quality tier) | 3.00 | 15.00 |
gemini-3-flash | gemini-3-flash-preview | cost-optimized extraction | 0.50 | 3.00 | |
gemini-3-1-flash-lite | gemini-3.1-flash-lite | bulk crawling | 0.25 | 1.50 | |
| OpenAI | gpt-5.4-mini | gpt-5.4-mini | OpenAI parity | 0.75 | 4.50 |
| OpenAI | gpt-5.4-nano | gpt-5.4-nano | budget extraction | 0.20 | 1.25 |
Custom-model override: if an operator picks a model not in the registry, the API allows it but returns {"warning": "unverified_model"} and the UI shows a yellow badge. We do not silently trust an arbitrary string.
Pinning: SIBYL_LLM_PIN_SNAPSHOTS=true resolves aliases to snapshots at startup. CI uses this.
Registry refresh: §6 Wave 6 adds scripts/llm/verify_registry.py, which probes each entry's provider for availability and refreshes last_verified_at. It runs nightly (out of scope for v0.10 to schedule; tracked as v0.11 follow-up).
5.5 Settings API surface
LLM settings are deployment-global and use the existing require_settings_admin policy (owner or admin role), matching how the rest of the AI settings page works today. They are not org-scoped in v0.10. The API returns this scope explicitly so the UI cannot misrepresent it.
Note on policy tightening: require_settings_admin allows any admin in any org to mutate instance-wide config. That is the existing settings-admin contract; the substrate inherits it rather than introducing a new policy break. If multi-tenant risk warrants tightening to owner-only, that is a separate policy decision tracked as a v0.11+ follow-up (§9.1).
New endpoints live under the AI umbrella at /api/settings/ai/*. v0.10 mounts /api/settings/ai/llm/* and reserves /api/settings/ai/embeddings/* for v0.11. All guarded by require_settings_admin:
GET /api/settings/ai/llm— returns the resolved config for every surface, includingsourceandlocked_by_envper field. Response shape includes a top-levelscope: "instance_wide"marker.PUT /api/settings/ai/llm/{surface}— updates provider/model/temperature/max_tokens for a surface. Validates the model against the registry (filtered tokind=llm); allows custom models with a warning. Returns409 LOCKED_BY_ENVif any updated key is env-locked.POST /api/settings/ai/llm/{surface}/test— runs a representative extraction against the resolved config. Returns latency, token counts, parsed output, andLLMErrorclassification on failure.POST /api/settings/ai/keys/{provider}/test— validates the configured API key for the provider usingcheck_provider_key(text probe against the provider's cheapest current model). Path is kind-agnostic because the same key serves both LLM and embedding calls on most providers.POST /api/settings/ai/models/{model_alias}/test— probes a specific registry entry's availability usingcheck_model_availability(provider, provider_model_id, key). Used by both the UI (when an operator picks an unfamiliar model) andverify_registry.py. Kind is derived from the registry entry, so this endpoint stays under/ai/rather than/ai/llm/.GET /api/settings/ai/registry?kind=llm— exposes the registry to the web UI, with optionalkindfilter (defaults to all). Embedding UI in v0.11 will use?kind=embedding.
Endpoints reject non-admin requests with 403. UI banner explicitly labels the section "Instance-wide settings (affects all organizations)."
Existing _validate_anthropic_key, _validate_openai_key, _validate_gemini_key in apps/api/src/sibyl/api/routes/setup.py and apps/api/src/sibyl/api/routes/settings.py are replaced by calls to sibyl_core.ai.validation.check_provider_key. Setup-mode behavior is preserved. These legacy endpoints stay at their current paths (no /ai/ prefix) because they're referenced by the existing first-run UI.
5.6 Web UI
apps/web/src/app/(main)/settings/admin/ai/page.tsx already lives under the "AI Settings" parent. v0.10 reorganizes the page into two sibling sections: Language Models (new) and Embeddings (existing), with the existing embedding controls moved into the Embeddings section unchanged. The Language Models card has the following contract:
- Instance-wide banner at top of the card: "These settings apply to all organizations in this deployment. Per-organization overrides are planned for a future release."
- Per-row source badge (
env/db/default). Env-locked rows are visually disabled with a tooltip naming the env var and explaining that env wins by design. - Save returns
409 LOCKED_BY_ENVon env-locked fields; the UI surfaces this inline ("This field is set by an environment variable and can't be changed here"), not as a generic toast error. - Test button per row calls
POST /api/settings/ai/llm/{surface}/testand renders latency, token count, and parsed sample output. Failure renders the classified error, not just the message. - Custom-model field is hidden behind an "Advanced" disclosure with a confirmation toggle: "I know this model isn't in the registry. Sibyl can't verify capabilities or cost." Yellow badge next to saved custom values.
- Recommended pill next to the curated default for each surface, sourced from
use_casesin the registry.
Per-surface rows for v0.10: Default, Crawler, Synthesis. Reflect and Prompt-hook rows ship when those consumers migrate (v0.11+).
5.7 Provider client lifetime
Global async-client caches break under multi-loop, multi-process, settings-mutation conditions. Contract:
clients.pykeys cached agents by(asyncio.get_running_loop().__hash__(), config_fingerprint).config_fingerprint = sha256(json(provider, model, temperature, max_tokens, timeout_seconds, api_key_hash)).LLMConfigSource.invalidate(surface)clears every cache entry for surfaces resolving from the changed setting.PUT /api/settings/ai/llm/{surface}callsinvalidatein-process after writing, before returning the response. This is the strong guarantee for the API process.- Cross-process propagation to the arq worker uses a TTL-based refresh, not pubsub. The
SettingsServicealready caches reads per_CacheEntrywith a TTL; the substrate plumbs acached_attimestamp into eachResolvedLLMConfigand the worker re-resolves when the cache TTL expires (default 60s) or when the worker explicitly callsinvalidate()on shutdown of a long- running task. We do not invent a new settings-change pubsub for v0.10. The trade-off is named: worker-side changes can lag by up to one TTL. - Worker startup (
apps/api/src/sibyl/jobs/worker.py) callsinstall_db_config_source(SettingsService)after the existingload_api_keys_from_db()runs. The substrate does not rely onload_api_keys_from_db()having mutatedos.environ; it reads keys directly fromSettingsServicevia the resolver. - No mutation of
os.environfrom substrate code. Env reads happen at resolution time only. Existingload_api_keys_from_db()continues to mutateos.environfor non-substrate callers during the Graphiti exit; that path is orthogonal. - Verification (Task 4): a multi-loop test that creates two
asyncio.new_event_loop()instances in the same process, builds clients on each, asserts isolated cache entries, then invalidates and asserts both clear.
5.8 Observability
observability.py wraps every call with structured logging: surface, provider, model, prompt_name, input_tokens, output_tokens, latency_ms, retries, error_class. Counters land on sibyl debug status so operators see usage per surface.
We do not install Logfire. The dependency strategy is pydantic-ai-slim[anthropic,google,openai] (see Task 1) and our existing structlog setup covers structured logging.
6. Decomposition
Wave 1: Foundation (parallel)
Task 1: Add pydantic-ai-slim and create ai/ package skeleton
- Files:
packages/python/sibyl-core/pyproject.toml,packages/python/sibyl-core/src/sibyl_core/ai/__init__.py,packages/python/sibyl-core/src/sibyl_core/ai/errors.py,packages/python/sibyl-core/src/sibyl_core/ai/llm/__init__.py - Parallel: Yes (with Tasks 2, 3)
- Implementation:
uv add 'pydantic-ai-slim[anthropic,google,openai]>=1.96.1,<1.97'at the sibyl-core package.- Avoid the full
pydantic-aidistribution (it pulls Logfire and all model deps we don't use). - Create the
ai/umbrella witherrors.pydefiningAIError(base),LLMError,LLMConfigError,LLMValidationError,LLMRateLimitError,LLMProviderError,LLMTimeoutError. Embedding-specific errors are reserved for v0.11 alongside their submodule. - Empty
ai/llm/submodule stub for Tasks 2 and 4-7 to populate. - Document the exact tested version (1.96.1 at time of writing) in
pyproject.tomlcomments.
- Verify:
moon run core:typecheckpasses.uv tree -p sibyl-core | rg 'pydantic-ai-slim'shows the slim package, nologfireextra.
Task 2: LLM config types and EnvConfigSource
- Files:
packages/python/sibyl-core/src/sibyl_core/ai/llm/config.py,tests/ai/llm/test_config.py - Parallel: Yes
- Implementation:
- Define
LLMSurface,LLMConfig,ConfigField[T],ResolvedLLMConfig,LLMConfigSourceprotocol, andEnvConfigSource. EnvConfigSourcehonors the env-only precedence (steps 1, 2, 4, 6 from §5.3).- Tests: precedence order; surface scoping; missing-key path; per-field
locked_by_envmarking.
- Define
- Verify:
moon run core:test -- tests/ai/llm/test_config.pypasses.moon run core:lint,moon run core:typecheckgreen.
Task 3: Shared registry
- Files:
packages/python/sibyl-core/src/sibyl_core/ai/registry.py,tests/ai/test_registry.py - Parallel: Yes
- Implementation:
ModelKind,ModelEntryPydantic model,ModelRegistrywith the six LLM entries from §5.4 (allkind=ModelKind.LLMin v0.10).- Lookup by alias and snapshot. Capability checks. Kind-filtered helpers:
llm_entries(),embedding_entries(),recommended_for(use_case, kind). - Custom-model handling returns a
ModelEntry-shaped record withcapabilities = set()and a warning flag.
- Verify:
moon run core:test -- tests/ai/test_registry.pypasses.
Wave 2: Provider clients and call surface (sequential after Wave 1)
Task 4: Provider factory and scoped client cache
- Files:
packages/python/sibyl-core/src/sibyl_core/ai/providers.py,packages/python/sibyl-core/src/sibyl_core/ai/clients.py,tests/ai/test_clients.py - Depends on: Tasks 1, 2, 3
- Implementation:
build_model(config)mapsLLMConfig→pydantic_ai.models.AnthropicModel | GoogleModel | OpenAIResponsesModel.clients.get_agent(surface, schema=None, system_prompt=...)returns apydantic_ai.Agentkeyed by(loop_id, fingerprint).- Per §5.7: no
os.environmutation; API keys are passed viapydantic_ai.providers.*Provider. - Tests: cache hit/miss; invalidate after config change; behavior across two event loops in the same process (use
asyncio.new_event_loopto simulate).
- Verify:
moon run core:test -- tests/ai/test_clients.pypasses.
Task 5: Extractor and structured-output retry
- Files:
packages/python/sibyl-core/src/sibyl_core/ai/llm/extractor.py,tests/ai/llm/test_extractor.py - Depends on: Task 4
- Implementation:
Extractor[T]wrapspydantic_ai.Agentwithoutput_type=T.extractreturns the parsed Pydantic model. PydanticAI'sModelRetryis mapped toLLMValidationErrorafter exhaustion.ModelHTTPErrormaps to provider/rate-limit/timeout error classes based on status code.extract_manyruns with bounded concurrency (asyncio.Semaphore) and returnslist[T | LLMError]so callers can decide how to surface partial failures.- Tests via PydanticAI's
TestModel; cover success, validation retry, rate-limit mapping, timeout.
- Verify:
moon run core:test -- tests/ai/llm/test_extractor.pypasses.- Local live smoke against one Anthropic + one Gemini + one OpenAI key (skip in CI when keys are absent).
Task 6: Generator and streamed text
- Files:
packages/python/sibyl-core/src/sibyl_core/ai/llm/generator.py,tests/ai/llm/test_generator.py - Depends on: Task 4
- Implementation:
Generator.generate(sync result) andGenerator.stream(AsyncIterator[str]).- System prompt and
model_overrideplumbing identical toExtractor.
- Verify:
moon run core:test -- tests/ai/llm/test_generator.pypasses.- Manual stream against Anthropic produces incremental text.
Task 7: Validation helpers — split key vs surface
- Files:
packages/python/sibyl-core/src/sibyl_core/ai/validation.py,tests/ai/test_validation.py - Depends on: Task 5
- Implementation:
check_provider_key(provider, key) -> KeyValidationResultissues a tiny non-structured text request to the provider's cheapest current model. Returns a tagged result distinguishingvalid,invalid_key,network,rate_limited,model_not_found,permission_denied.check_model_availability(provider, provider_model_id, key) -> ModelValidationResultissues a 1-token text request against a specific model. Used byverify_registry.pyand by the UI when an operator picks a model the registry hasn't verified recently. Returns the same tagged classification plus the actual provider model string echoed back when available (catches silent provider aliasing).test_surface_config(surface, source) -> SurfaceTestResultruns a representative extraction using the resolved config and returns parsed output, latency, token counts.- Classify
pydantic_ai.exceptions.ModelHTTPErrorstatus codes (401/403/404/429/5xx).
- Verify:
moon run core:test -- tests/ai/test_validation.pypasses with mocked transport.- Manual run against a known-good key and a junk key returns the right tag for each.
Wave 3: DB-backed config source and Settings API (sequential after Wave 2)
Task 8: DBSettingsConfigSource
- Files:
apps/api/src/sibyl/ai/llm/__init__.py,apps/api/src/sibyl/ai/llm/config_source.py,apps/api/tests/ai/llm/test_config_source.py - Depends on: Task 2
- Implementation:
DBSettingsConfigSourceimplementsLLMConfigSource, reading fromSettingsServiceusing keysllm.<surface>.provider,llm.<surface>.model, etc.- Resolution order from §5.3 — env wins over DB, this resolver tracks both and returns
locked_by_envper field. - API key resolution uses the existing setting keys and encrypted-secret storage but resolves env-first for substrate callers. Do not call
get_anthropic_key,get_openai_key, orget_gemini_keydirectly for the final value because those helpers are DB-first for legacy callers.
- Verify:
moon run api:test -- tests/ai/llm/test_config_source.pypasses.- Round-trip set→get with env unset returns DB values; env-set returns env value with lock flag.
Task 9: Settings service additions and cache invalidation
- Files:
apps/api/src/sibyl/services/settings.py,apps/api/src/sibyl/ai/llm/service.py,apps/api/tests/test_settings_service.py - Depends on: Task 8
- Implementation:
- Add
get_llm_setting(surface, field),set_llm_setting(surface, field, value)toSettingsService. Reuse the existing encrypted-secret path for keys; no new key storage. apps/api/src/sibyl/ai/llm/service.pyexposesinstall_db_config_source()for app startup and callsLLMConfigSource.invalidate(surface)in-process after settings writes. Worker processes rely on TTL-based refresh per §5.7; no settings-change pubsub is introduced in v0.10.- Worker startup (
apps/api/src/sibyl/jobs/worker.py) callsinstall_db_config_source()afterload_api_keys_from_db().
- Add
- Verify:
moon run api:test -- tests/test_settings_service.pypasses.- Spin up the API + worker locally, set a value via API, observe API-side invalidation immediately and worker-side refresh after the configured TTL.
Task 10: Settings API routes
- Files:
apps/api/src/sibyl/ai/llm/routes.py,apps/api/src/sibyl/api/routes/setup.py,apps/api/src/sibyl/api/routes/settings.py,apps/api/tests/ai/test_llm_settings_api.py - Depends on: Tasks 7, 9
- Implementation:
- Endpoints from §5.5 with
require_settings_admin. Response schemas explicit aboutscope: "instance_wide"and per-fieldsource+locked_by_env. - Replace the inline retired Haiku 3 probes in
setup.py:121andsettings.py:135withcheck_provider_key. - OpenAPI schemas + tests for: GET returns expected shape; PUT writes; PUT returns 409 when env-locked; POST
/testruns end-to-end; non-admin returns 403.
- Endpoints from §5.5 with
- Verify:
moon run api:test -- tests/test_llm_settings_api.pypasses.curl -X GET http://localhost:3334/api/settings/ai/llmreturns expected shape with sources.
Wave 4: Call-site migration (parallel after Wave 3)
Task 11: Migrate crawler entity extraction
Files:
apps/api/src/sibyl/crawler/graph_integration.py,apps/api/tests/crawler/test_graph_integration.py,apps/api/tests/crawler/fixtures/extraction_baseline.json(new)Depends on: Tasks 5, 8
Parallel: Yes (with Task 12)
Implementation:
Define a container schema:
pythonclass ExtractedEntityPayload(BaseModel): name: str = Field(min_length=1, max_length=200) type: Literal["concept", "tool", "pattern", "person", "organization", "project"] description: str = Field(min_length=1, max_length=500) confidence: float = Field(ge=0.0, le=1.0) class ExtractedEntitiesPayload(BaseModel): entities: list[ExtractedEntityPayload] = Field(max_length=50)Replace
EntityExtractor.extract_from_chunkwithExtractor(ExtractedEntitiesPayload, ...)and convert results back to the existingExtractedEntitydataclass withsource_chunk_idandsource_urlre-attached.Delete the markdown JSON-stripping fallback; PydanticAI handles structured output.
extract_batchusesExtractor.extract_many.
Verify:
moon run api:test -- tests/crawler/test_graph_integration.pypasses.- Schema-success gate: for the 20-chunk baseline fixture, ≥95% chunks return valid payloads.
- Field-coverage gate: ≥98% of returned entities have non-empty
description, validtype,confidence ∈ [0,1]. - Type-distribution drift gate: per-type entity ratio drifts ≤ 25% vs. fixture baseline.
- Entity-count gate: ±15% (loose, distribution drift catches the real regressions).
Task 12: Migrate synthesis generator
- Files:
apps/api/src/sibyl/generator/llm.py,apps/api/tests/generator/test_llm.py - Depends on: Task 6, 8
- Parallel: Yes
- Implementation:
- Replace
anthropic.Anthropic()direct client withGenerator(surface=LLMSurface.SYNTHESIS). - Drop the sync client path; synthesis is async throughout the chain anyway.
- Carry over the system prompt verbatim.
- Replace
- Verify:
moon run api:test -- tests/generator/test_llm.pypasses.- Manual: run an existing synthesis preset, diff output structure.
Wave 5: Web UI (sequential after Wave 4)
Task 13: Web API client and hooks
- Files:
apps/web/src/lib/api.ts,apps/web/src/lib/hooks.ts - Depends on: Task 10
- Implementation:
getLLMSettings,updateLLMSurface,testLLMSurface,testProviderKey,getLLMRegistry.- React Query hooks mirroring the existing settings hooks.
- Generated types from OpenAPI shapes added in Task 10.
- Verify:
moon run web:typecheckandmoon run web:lintpass.
Task 14: Language Models settings card
- Files:
apps/web/src/app/(main)/settings/admin/ai/page.tsx,apps/web/src/components/settings/llm-config-card.tsx(new),apps/web/src/components/settings/llm-config-card.test.tsx - Depends on: Task 13
- Implementation:
- Card per §5.6: instance-wide banner, per-row provider+model selects, source badge, env-lock disabled state with tooltip, recommended pill, Test button with structured failure rendering.
- "Advanced — custom model" disclosure with confirmation toggle and yellow badge for saved custom values.
- Save renders inline 409 message on env-locked fields; toast only for unexpected errors.
- Verify:
moon run web:test -- llm-config-cardpasses (unit tests against mocked hooks).- Manual via
agent-browser: change crawler to Gemini Flash, save, Test, see latency + tokens.
Wave 6: Verification, smoke, docs (sequential after Wave 5)
Task 15: Smoke harness and registry probe script
- Files:
scripts/llm/smoke.py(new),scripts/llm/verify_registry.py(new),docs/architecture/SIBYL_LLM_SUBSTRATE_PLAN.md(append benchmark appendix) - Depends on: Tasks 11, 12
- Implementation:
smoke.pyruns the same 20-chunk fixture from Task 11 throughExtractorfor each (provider, model) pair. Reuses Task 11's gate functions: schema success ≥95%, field coverage ≥98%, type-distribution drift ≤25%, entity-count ±15%. Adds a name-quality gate (entity name length ≥2, non-whitespace, not in{"UNKNOWN", "null", "None", ""}). Records p50/p95 latency and per-chunk cost.verify_registry.pyprobes each registry entry viacheck_model_availability(entry.provider, entry.provider_model_id, key)(notcheck_provider_key— that only validates the key against the cheapest model). Per-entry failures updatelast_verified_ataccordingly and log deprecation candidates. Exit code reflects whether any first-class entry failed.
- Verify:
python scripts/llm/smoke.pyend-to-end with each provider's key set.python scripts/llm/verify_registry.pyexits 0 with the registry intact.
Task 16: Documentation
- Files:
packages/python/sibyl-core/README.md,apps/api/README.md,apps/web/README.md, this doc. - Depends on: Tasks 11, 12, 14
- Implementation:
- Document substrate, env contract, settings UI surface, custom-model behavior, how to add a provider. Cross-link from Northstar and Roadmap.
- Add the multi-tenant boundary explicitly: this is instance-wide config. Per-org override is a v0.11+ follow-up tracked separately.
- Verify: manual.
Task 17: Final gates
- Depends on: all prior.
- Implementation:
moon run :checkgreen.agent-browsersmoke of the settings UI in dark mode.rgfor the retired Haiku 3 snapshot returns empty.rg "AsyncAnthropic|anthropic\.Anthropic|openai\.AsyncOpenAI" apps/api/src/sibyl/crawler/graph_integration.py apps/api/src/sibyl/generator/llm.pyempty after Tasks 11-12.rg "google\.genai" apps/api/src packages/python/sibyl-core/src/sibyl_coreshows only embedding adapters or Graphiti-compat paths until the v0.11 embedding migration.
- Verify: all of the above succeed.
7. Verification gate summary
| Wave | Gate |
|---|---|
| 1 | moon run core:check green; slim install only |
| 2 | moon run core:check green; multi-loop client cache test green |
| 3 | API integration tests green; env-lock 409 verified; worker invalidates |
| 4 | Schema-success ≥95%, field coverage ≥98%, type-distribution drift ≤25% |
| 5 | Web typecheck + lint + unit + visual smoke green; 409 inline error verified |
| 6 | moon run :check green; smoke + registry-probe results recorded |
8. Risks and mitigations
- Package boundary regressions. Mitigation: add a
tests/test_core_no_apps_imports.pythat fails ifsibyl_core.aiimports anything fromapps.*orsibyl.services.*. - PydanticAI minor-release churn. Mitigation:
>=1.96.1,<1.97constraint; explicit upgrade task per minor;scripts/llm/verify_registry.pycatches model deprecations. - Multi-tenant footgun if anyone bolts org-scoping on later. Mitigation: the response carries
scope: "instance_wide". UI banner is explicit. Per-org work has a separate plan and migration story; do not pretend the v0.10 settings are tenant-safe. - Stale clients after settings change. Mitigation:
LLMConfigSource.invalidateis called byset_llm_setting; worker re-resolves on the existing settings-cache TTL; cache key includesloop_id. - Custom-model rot. Mitigation: UI explicitly warns; API returns
unverified_modelwarning; custom values still flow through the same validators. - Validation conflation. Mitigation:
check_provider_keyandtest_surface_configare separate endpoints with classified error types. - Crawler extraction quality regression. Mitigation: schema, field coverage, and type-drift gates with a frozen 20-chunk fixture, not just entity count.
- Hook and reflect drift while out of scope. Mitigation: explicit non-goal note in §3; ADR-style comment in each file pointing to this plan and the v0.11 follow-up.
9. Out of scope (v0.11+ follow-ups)
9.1 Future consumers (out of scope for this plan)
- LLM-assisted reflection. Today's
reflect.pyis heuristic. A future feature replaces or augments it withExtractor[ReflectionCandidatesPayload]on the substrate. Requires: golden fixtures, latency/cost gates, fallback to heuristic when LLM unavailable, policy-metadata preservation. Tracked separately. - Prompt-submit hook migration. Hook stays on stdlib HTTP. Migration path is to add a small local API endpoint (
POST /api/llm/hook-query) that uses the substrate server-side; the hook calls that instead of Anthropic directly. Loses the standalone fast path; needs benchmark. - Per-organization LLM configuration. Requires extending
system_settingskeys withorganization_id, migration of resolved-config callers, and UI per-org override panes. - Streaming synthesis to the web UI. Backend support lands in
Generator.stream(Task 6); the web surface is a separate UX project. - Tool use in
Generator. When a consumer needs retrieval-aware synthesis. - Scheduled registry refresh.
verify_registry.pyexists; scheduling it (nightly job) is a v0.11 ops task.
9.2 Out of plan, not out of mind
- Token budgeting and cross-surface rate-limit coordination.
- Telemetry export to Logfire (kept off-path; we don't depend on it).
- Per-call-site latency SLOs.
9.3 Embedding migration to the AI substrate (v0.11+)
PydanticAI ships a real Embedder class with OpenAI, Google, Cohere, Bedrock, VoyageAI, and SentenceTransformers support. The v0.10 substrate is shaped to host embeddings without restructure: the ai/ umbrella, the kind-agnostic check_provider_key, the kind-tagged ModelEntry, the shared client cache, the /api/settings/ai/ route prefix, and the parent "AI Settings" UI page.
Migration (deferred): replace NativeEmbeddingProvider and the crawler's Embedder client with PydanticAI's Embedder under sibyl_core.ai.embeddings. The Graphiti-shaped sibyl_core/graph/gemini_embedder.py stays untouched until the Graphiti exit removes the path. Existing SIBYL_GRAPH_EMBEDDING_* env vars continue to work alongside new SIBYL_AI_EMBEDDING_* vars during a deprecation window.
Why we hold for v0.10: current embedding code works, has its own cache layer, and migrating it adds 3-4 call sites plus a Graphiti-compat coordination beat that does not earn the slip.
10. Recommendation
Ship v4. The plan is internally aligned around the sibyl_core.ai umbrella, DB-backed apps/api/src/sibyl/ai/llm config source, existing require_settings_admin policy, env-over-DB precedence, and TTL-based worker refresh. The v4 AI-umbrella reshape keeps v0.10 focused on LLMs while leaving an obvious v0.11 path for PydanticAI embeddings without renaming the substrate.
Wave 1 (Tasks 1-3) is parallel and unblocked. Stop iterating with cross-model review after this pass: the remaining surface is implementation choices, not architecture.
11. Verification receipts
11.1 Repo facts checked on 2026-05-15
apps/api/src/sibyl/crawler/graph_integration.pydirectly importsAsyncAnthropicand hardcodesclaude-haiku-4-5.apps/api/src/sibyl/generator/llm.pydirectly constructsanthropic.Anthropic().apps/cli/src/sibyl_cli/data/hooks/user-prompt-submit.pyuses stdlib HTTPS withclaude-haiku-4-5-20251001, not an SDK.packages/python/sibyl-core/src/sibyl_core/tools/reflect.pyis heuristic/regex-based and imports no LLM SDK.apps/api/src/sibyl/persistence/surreal/setup.pydefinesrequire_settings_adminas owner or admin, with setup-mode bootstrap access.apps/api/src/sibyl/services/settings.pyhas_CACHE_TTL = 60, DB-first legacy key helpers, andload_api_keys_from_db()for non-substrate callers.
11.2 Primary-source checks
- PydanticAI: PyPI lists
pydantic-ai-slim1.96.1 released May 15, 2026, withanthropic,google, andopenaiextras. PydanticAI install docs confirm slim extras omit Logfire by default: https://pypi.org/project/pydantic-ai-slim/ and https://pydantic.dev/docs/ai/overview/install/. - PydanticAI models: docs confirm
AnthropicModel,GoogleModel,OpenAIResponsesModel, andEmbedderare current public surfaces: https://pydantic.dev/docs/ai/models/anthropic/, https://pydantic.dev/docs/ai/models/google/, https://pydantic.dev/docs/ai/models/openai/, and https://pydantic.dev/docs/ai/guides/embeddings/. - Anthropic: Claude model overview lists
claude-haiku-4-5-20251001,claude-haiku-4-5,claude-sonnet-4-6, and the $1/$5 and $3/$15 per-MTok pricing used in §5.4: https://platform.claude.com/docs/en/about-claude/models/overview. - Google: Gemini API model and pricing docs list
gemini-3-flash-preview, Gemini 3.1 Flash-Lite, structured output support, and the $0.50/$3.00 plus $0.25/$1.50 per-MTok rates used in §5.4: https://ai.google.dev/gemini-api/docs/models and https://ai.google.dev/gemini-api/docs/pricing. - OpenAI: API pricing docs list
gpt-5.4-miniandgpt-5.4-nanostandard rates at $0.75/$4.50 and $0.20/$1.25 per MTok: https://developers.openai.com/api/docs/pricing.
Appendix A. Smoke and Registry Receipts
Implementation adds two live verification scripts:
uv run python scripts/llm/verify_registry.py --require-keys --json
uv run python scripts/llm/smoke.py --require-keys --jsonverify_registry.py probes every first-class LLM registry entry with check_model_availability(entry.provider, entry.provider_model_id, key). Missing keys are skipped by default for local docs work and become failures with --require-keys.
smoke.py runs a fixed 20-chunk crawler extraction fixture through Extractor for every selected registry model. Gates are schema success >=95%, required-field coverage >=98%, type-distribution drift <=25%, entity-count delta <=15%, and name-quality >=98%. It records p50/p95 latency and estimated per-chunk cost from registry prices.
Current implementation note: cost is estimated from prompt/output character counts because Extractor.extract() intentionally returns only the parsed schema. Token usage remains available on validation probes and can be promoted into extractor telemetry when the observability layer lands.
Latest local receipt on 2026-05-15 with Anthropic, Gemini, and OpenAI keys present:
| Model alias | Schema | Fields | Drift | Count delta | Names | p50 / p95 latency | Estimated cost |
|---|---|---|---|---|---|---|---|
claude-haiku-4-5 | 100% | 100% | 15.0% | 0.0% | 100% | 1961 / 2904 ms | $0.019475 |
claude-sonnet-4-6 | 100% | 100% | 11.7% | 0.0% | 100% | 3733 / 4351 ms | $0.061020 |
gemini-3-flash | 100% | 100% | 11.7% | 0.0% | 100% | 4993 / 13708 ms | $0.009904 |
gemini-3-1-flash-lite | 100% | 100% | 10.0% | 0.0% | 100% | 1163 / 1443 ms | $0.004535 |
gpt-5.4-mini | 100% | 100% | 13.3% | 0.0% | 100% | 1466 / 1948 ms | $0.014437 |
gpt-5.4-nano | 100% | 100% | 9.8% | 3.3% | 100% | 1690 / 2506 ms | $0.003926 |
verify_registry.py --require-keys --json also passed for all six entries. The Gemini crawler path uses prompted structured output to avoid the Google native response-schema rejection on nested Pydantic $defs schemas.
12. Changelog
Driven by Codex cross-model review at xhigh effort over two passes, then a Bliss-prompted AI-umbrella reshape.
v3 → v4 (AI umbrella for future embedding integration)
Triggered by the observation that PydanticAI ships a real Embedder class. The substrate is restructured to host embeddings as a future sibling without a v0.11 rename or re-layout. No implementation work added to v0.10; structural-only changes.
- §5.1 package layout:
sibyl_core.llm/→sibyl_core.ai/withai.llm/submodule. Shared primitives (registry, providers, clients, validation, observability, errors, _testing) live directly underai/.ai.embeddings/reserved for v0.11. Same restructure onapps/api/src/sibyl/ai/. - §5.4 registry:
ModelEntrygainskind: ModelKind(LLM | EMBEDDING) plusembedding_dimensionsand nullable LLM-only fields. v0.10 ships onlykind=LLMentries. Lookup helpers accept a kind filter. - §5.5 settings API: routes moved from
/api/settings/llm/*to/api/settings/ai/llm/*./api/settings/ai/keys/{provider}/testand/api/settings/ai/models/{alias}/testlive under/ai/(kind-agnostic) because embeddings will reuse them.GET /api/settings/ai/registrytakes an optionalkindfilter. - §5.6 web UI: the existing
admin/ai/page.tsxis reorganized into "Language Models" and "Embeddings" sibling sections under the existing "AI Settings" parent. Embedding controls move unchanged into the Embeddings section. - §6 Wave 1 tasks: Task 1 creates
ai/umbrella andai/llm/submodule. Task 2 lives atai/llm/config.py. Task 3 lives atai/registry.py(renamed fromllm/models.py). - §9.3: new section reserving the embedding migration path with deprecation/rollout notes.
v2 → v3 (post second-pass Codex review)
Codex confirmed 7 of 10 prior findings fully resolved by v2; the v3 patches close the remaining 3 partial items and the 2 new HIGHs.
- §5.2 config shape:
LLMConfigis the raw shape; newResolvedLLMConfigcarries per-fieldConfigField[T]metadata (value,source,locked_by_env,env_var). API responses use the resolved DTO so different fields can be env-locked independently. (Addresses new HIGH 1.) - §5.3 + §5.7 API key precedence: keys are resolved via
LLMConfigSource.resolve()rather than throughos.environmutation.load_api_keys_from_db()stays for non-substrate callers but the substrate does not depend on it. Removes the inconsistency between "env wins" and the existing DB-into-env copy. (Tightens BLOCKER 2 + HIGH 8 partial resolutions.) - §5.5 auth role: aligned with the existing
require_settings_adminpolicy (owner or admin) instead of inventing an owner-only break. Tightening is named as a v0.11 policy decision. (Addresses HIGH 3 partial resolution.) - §5.7 cross-process invalidation: named the actual mechanism — in-process invalidation on the API side; TTL-based refresh on the worker side (existing
SettingsService._CacheEntryTTL). No invented pubsub. Trade-off (up to 60s worker lag) is stated. (Addresses HIGH 8 partial.) - §5.5 + §6 Task 7 model availability: new
check_model_availability(provider, model_id, key)helper andPOST /api/settings/ai/models/{model_alias}/testendpoint (kind-agnostic; mounted under/ai/not/ai/llm/because embeddings will reuse it).verify_registry.pyprobes per-entry rather than per-provider. (Addresses new HIGH 2.) - §6 Task 15 smoke gates: reuses Task 11 gate functions instead of redefining looser ones; adds a name-quality gate. (Addresses MEDIUM 10 partial.)
v1 → v2 (first-pass Codex review)
- §5.1 Package layout: split into
sibyl_core.llm(pure) andapps/api/src/sibyl/llm(DB-backed). AddedLLMConfigSourceprotocol +EnvConfigSourcedefault. (Addresses BLOCKER 1.) - §5.3 Precedence: changed to env wins over DB, with
locked_by_envsemantics and409 LOCKED_BY_ENVon write. UI lock state now matches the contract. (Addresses BLOCKER 2.) - §5.5 Multi-tenancy: LLM settings are explicitly instance-wide and owner-only. UI banner names the scope. Per-org config moved to v0.11. (Addresses HIGH 3.)
- §3, §4, §6: dropped reflect from migration scope (it's heuristic, not an SDK call site). Dropped prompt hook from migration scope (stdlib-HTTP design is intentional). Both tracked as v0.11 follow-ups. (Addresses HIGH 4 + HIGH 5.)
- §6 Task 1:
pydantic-ai-slim[anthropic,google,openai]>=1.96.1,<1.97. Avoids Logfire and full model deps. (Addresses HIGH 6.) - §5.4 Registry: versioned
ModelEntryrecords withprovider_model_id,pydantic_ai_model_class,last_verified_at,deprecated_after, capability flags, cost source. Custom-model override allowed with explicit warning. (Addresses HIGH 7.) - §5.7 Provider lifetime: scoped cache by
(loop_id, config_fingerprint); invalidation hook on settings writes; worker startup reload; noos.environmutation. (Addresses HIGH 8.) - §6 Task 7: split validation into
check_provider_key(cheap text probe with classified results) andtest_surface_config(full pipeline). (Addresses MEDIUM 9.) - §6 Task 11: crawler schema is a container
ExtractedEntitiesPayload(entities: list[...])with field validators. Smoke gates by schema success, field coverage, and type-distribution drift, not just entity count. (Addresses MEDIUM 10.) - Added §9.1 explicit future-consumer reservations so the substrate is shaped for reflection/hook/per-org without preempting their migration plans.
