Capability System
Capabilities define what Iris can do. Each capability is a TOML file that specifies the task prompt and expected output format.
Location: src/agents/capabilities/*.toml
Design Philosophy
Separation of Concerns
- TOML files define the task and output type
- Agent handles execution and tool calling
- Types define the structured response schema
This separation allows:
- Non-programmers to modify task instructions
- Easy experimentation with different prompts
- Version control of prompt engineering
- Compile-time embedding for portability
LLM-Driven Structure
Capabilities don't rigidly enforce structure — they guide the LLM. For example:
- Commit messages: JSON with specific fields (
emoji,title,message) - Reviews: Markdown with suggested sections, but Iris decides final structure
- PRs: Markdown with flexibility for project-specific conventions
The LLM adapts to project needs while following general guidelines.
Capability Structure
A capability TOML has three fields:
name = "capability_name"
description = "Short description of what this capability does"
output_type = "OutputTypeName"
task_prompt = """
Multi-line prompt that instructs Iris...
"""Output Types
Output types map to Rust enums in src/agents/iris.rs:
pub enum StructuredResponse {
CommitMessage(GeneratedMessage), // JSON: { emoji, title, message, completion_message }
PullRequest(MarkdownPullRequest), // Markdown wrapper: { content: String }
Changelog(MarkdownChangelog), // Markdown wrapper: { content: String }
ReleaseNotes(MarkdownReleaseNotes), // Markdown wrapper: { content: String }
Review(crate::types::Review), // Structured: { summary, metadata, findings[], stats }
SemanticBlame(String), // Plain text
PlainText(String), // Fallback
}Strict structured types (GeneratedMessage, Review) define schemas Iris must populate field-by-field. Markdown wrappers carry a single content: String field and let Iris choose the layout. The internal verify capability returns a private Critique struct (see Critic Verification); it never appears in StructuredResponse.
Built-in Capabilities
1. Commit (commit.toml)
Purpose: Generate commit messages from staged changes
Output: GeneratedMessage
#[derive(Serialize, Deserialize, JsonSchema)]
pub struct GeneratedMessage {
pub emoji: Option<String>, // Single gitmoji or null
pub title: String, // Subject line (max 72 chars)
pub message: String, // Body (may be empty)
#[serde(default)]
pub completion_message: Option<String>, // Short UI status, e.g. "Auth refactor ready."
}completion_message is required by commit.toml so the Studio TUI has a one-line status to show when generation finishes.
Key instructions:
- Start with
git_diff()for change evidence - Use
project_docs(doc_type="context")when repository conventions or product framing matter - Treat
project_docs(doc_type="context")as a compact snapshot; use targeted doc types for full files - Adapt context strategy based on changeset size
- Use
parallel_analyzefor very large changes
Style adaptation:
- Gitmoji mode: Set
emojifield from gitmoji list - Conventional mode: Set
emojito null, use conventional prefixes - Presets: Apply personality while maintaining structure
2. Review (review.toml)
Purpose: Analyze code changes and emit a structured, parseable review
Output: Review — a strict JSON shape, not a markdown wrapper. The capability's output_type = "Review" (see src/agents/capabilities/review.toml:3).
pub struct Review {
pub summary: String,
pub metadata: ReviewMetadata, // risk_level, strategy, specialist_passes, coverage_notes
pub findings: Vec<Finding>, // id, severity, confidence, file, line range, category, body, suggested_fix, evidence
pub stats: ReviewStats, // files_reviewed, findings_count, critical/high/medium/low counts
pub parse_failed: bool, // set when from_unstructured() rescues raw text
}Finding.confidence is an integer 0–100. Findings below DEFAULT_MIN_FINDING_CONFIDENCE = 70 are hidden from visible_findings() and from inline GitHub comments. Category is a 12-variant enum: security, performance, error_handling, complexity, abstraction, duplication, testing, style, api_contract, concurrency, documentation, other. Severity and RiskLevel accept critical/high/medium/low.
Key instructions (from review.toml):
- Use
git_diff(detail="summary")first; escalate torepo_map,file_read,static_analysis,git_show, orparallel_analyzebased on size and risk. - Only report findings with confidence ≥ 70; do not duplicate issues a configured linter or type-checker already catches.
- Cite an exact
file:start_line(andend_line) on a changed line; supplysuggested_fixwhen feasible andevidencereferences for non-trivial claims. - Set
metadata.risk_level, name yourstrategy, listspecialist_passesyou ran (or delegated throughparallel_analyze), and recordcoverage_notes. - Return a JSON object matching the schema — never markdown. If there are no actionable issues, return
findings: []and zero counts instats.
3. Pull Request (pr.toml)
Purpose: Generate PR descriptions from branch changes
Output: MarkdownPullRequest
Suggested sections:
- Summary
- Changes
- Test Plan
- Breaking Changes (if any)
- Screenshots/Demos (if applicable)
Key instructions:
- Use
git_diff(from="<default-branch>", to="HEAD")for full branch context - Analyze entire feature branch, not just latest commit
- Include migration/upgrade notes for breaking changes
- Suggest testing approach
4. Changelog (changelog.toml)
Purpose: Generate changelog entries in Keep a Changelog format
Output: MarkdownChangelog
Structure:
## [Version] - YYYY-MM-DD
### Added
- New features
### Changed
- Enhancements to existing features
### Deprecated
- Features marked for removal
### Removed
- Deleted features
### Fixed
- Bug fixes
### Security
- Security patchesKey instructions:
- Group changes by category
- Be specific about what changed
- Include migration notes if needed
- Focus on user-facing impact
5. Release Notes (release_notes.toml)
Purpose: Generate user-facing release documentation
Output: MarkdownReleaseNotes
Suggested sections:
- Highlights
- Breaking Changes
- New Features
- Improvements
- Bug Fixes
- Performance
- Upgrade Instructions
Key instructions:
- Write for end users, not developers
- Highlight impact and benefits
- Include version numbers and dates
- Provide upgrade path for breaking changes
6. Chat (chat.toml)
Purpose: Interactive conversation with Iris in Studio
Output: Varies (text or tool calls)
Special features:
- Access to content update tools (
update_commit,update_pr,update_review) - Can read and modify current Studio content
- Freeform conversation for exploration
7. Semantic Blame (semantic_blame.toml)
Purpose: Explain the history and reasoning behind code
Output: SemanticBlame (plain text)
Key instructions:
- Read git log for the file/region
- Analyze commit messages and diffs
- Explain why the code evolved this way
- Connect changes to broader project goals
8. Verify (verify.toml) — Critic Verification
Purpose: Internal critic pass that checks a generated artifact against repository evidence.
Output: Critique — a private struct in iris.rs with fields requires_revision: bool, issues: Vec<CritiqueIssue> (title, body, severity), revision_prompt: String, confidence: u8. Critique is not part of StructuredResponse; it's consumed inside verify_response_if_enabled and used to decide whether to regenerate the artifact.
How it runs. After execute_output_type produces a StructuredResponse, execute_task passes the result to verify_response_if_enabled. When the critic is enabled (default Config.critic_enabled = true) and the (capability, output_type) pair matches review / pr / changelog / release_notes, Iris loads verify.toml, runs execute_with_agent::<Critique> against the serialized artifact and the original user prompt, and:
- If
requires_revisionisfalse(ortruebut issues andrevision_promptare both empty), returns the original artifact. - Otherwise builds a revision prompt with the critic's issues and instruction appended and calls
execute_output_typeexactly once more.
commit / GeneratedMessage uses the same mechanism only when the invocation explicitly opts in with gen --critic.
What the critic flags. Unsupported claims, asserted risks without code verification, review findings citing the wrong file or line, PR/changelog/release note text that overstates scope, and missing caveats when an inference is presented as fact. It deliberately skips wording preferences and style choices that match repository conventions.
The critic is a safety net: any error inside the pass (capability load failure, schema mismatch, network error) is logged as a warning and the original artifact is returned unchanged. To opt out, set critic_enabled = false in the Git-Iris config.
Creating a Custom Capability
Step 1: Create the TOML File
Create src/agents/capabilities/my_capability.toml:
name = "my_capability"
description = "What my capability does"
output_type = "MyOutputType"
task_prompt = """
Instructions for Iris on how to complete this task.
## Tools Available
- `git_diff()` - Get changes
- `file_read()` - Read files
- `code_search()` - Search for patterns
## Output Requirements
Describe the expected structure...
"""Step 2: Define the Output Type
In src/types/my_output.rs:
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
pub struct MyOutputType {
pub summary: String,
pub details: Vec<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub metadata: Option<Metadata>,
}Step 3: Add to StructuredResponse Enum
In src/agents/iris.rs:
pub enum StructuredResponse {
// ... existing variants
MyOutput(MyOutputType),
}
impl fmt::Display for StructuredResponse {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
// ... existing matches
StructuredResponse::MyOutput(output) => {
write!(f, "{}", output.summary)
}
}
}
}Step 4: Embed the Capability
In src/agents/iris.rs, add the constant:
const CAPABILITY_MY_CAPABILITY: &str = include_str!("capabilities/my_capability.toml");And update the loader:
fn load_capability_config(&self, capability: &str) -> Result<(String, String)> {
let content = match capability {
// ... existing capabilities
"my_capability" => CAPABILITY_MY_CAPABILITY,
_ => { /* fallback */ }
};
// ...
}Step 5: Handle Execution
In execute_output_type() (src/agents/iris.rs:1068-1129) — the inner dispatch function called by execute_task — add a match arm:
match output_type {
// ... existing types
"MyOutputType" => {
let response = self
.execute_with_agent::<MyOutputType>(system_prompt, user_prompt)
.await?;
Ok(StructuredResponse::MyOutput(response))
}
// ...
}execute_task itself just loads the capability, injects style instructions, calls execute_output_type, then runs verify_response_if_enabled for the critic pass — you don't need to touch it for a new output type unless you want the critic to gate your new artifact (add the (capability, output_type) pair to should_run_critic if you do).
Step 6: Test
cargo build
cargo run -- my-capabilityPrompt Engineering Best Practices
1. Context Gathering
Instruct Iris to gather the highest-signal evidence first, then pull repo docs when they materially change the answer:
task_prompt = """
## Context Gathering
`project_docs(doc_type="context")` returns a compact snapshot of README and agent instructions.
Start with `git_diff()` for code evidence, then call `project_docs` when conventions, terminology, or workflow rules matter.
"""2. Tool Guidance
List available tools with clear purposes:
## Tools Available
- `git_diff()` - Get staged changes with relevance scores
- `git_log(count=5)` - Recent commits for style reference
- `file_read(path, start_line, num_lines)` - Read file contents3. Size-Based Strategy
Guide Iris on how to handle different changeset sizes:
## Context Strategy by Size
- **Small** (≤3 files): Consider all changes
- **Large** (>10 files): Focus on high-relevance files
- **Huge** (>20 files): Use `parallel_analyze`4. Output Requirements
Be explicit about format:
## Output Requirements
- **Subject line**: Imperative mood, max 72 chars
- **Body**: Wrap at 72 chars, explain WHY not what
- **Plain text only**: No markdown, no code fences5. Avoid Uncertainty
Instruct Iris to be definitive:
## Writing Guidelines
- **NEVER use speculative language**: Avoid "likely", "probably", "seems"
- If unsure, use tools to investigate
- State facts definitively6. Style Flexibility
Allow preset injection:
## Style Adaptation
If STYLE INSTRUCTIONS are provided, prioritize that style.
A cosmic preset means cosmic language. Express the style!This enables users to inject personality via presets.
Advanced Patterns
Conditional Tool Calls
Instruct Iris to adapt:
If the changeset is large (>20 files or >1000 lines):
- Use `parallel_analyze` to distribute analysis
- Example: parallel_analyze({ "tasks": ["Analyze auth/", "Review API/"] })
Otherwise:
- Use `git_diff()` and `file_read()` directlyMulti-Stage Analysis
Guide a workflow:
1. Call `git_diff()` to see what changed
2. Identify the primary affected subsystem
3. Call `code_search()` to find related patterns
4. Call `file_read()` for detailed context
5. Synthesize findings into a coherent summaryProject-Specific Adaptation
Use project docs:
When `project_docs(doc_type="context")` is relevant:
- Follow any commit conventions from AGENTS.md
- Use terminology from README
- Respect project style guideValidation and Recovery
All JSON outputs go through schema validation:
- Schema generation —
schemars::schema_for!creates JSON schema from Rust type - Prompt injection — Schema is added to prompt as a constraint
- Response parsing —
extract_json_from_response()finds JSON in response - Sanitization —
sanitize_json_response()fixes control characters - Validation —
validate_and_parse()attempts recovery if parsing fails
See Output Validation for details.
Debugging Capabilities
Run with --debug to see:
- Which capability is loaded
- The full prompt sent to the LLM
- Tool calls made by Iris
- JSON extraction and validation steps
- Token usage statistics
git-iris gen --debugColor-coded output shows:
- 🔵 Blue — Phase transitions
- 🟢 Green — Successful operations
- 🟡 Yellow — Warnings
- 🔴 Red — Errors
Best Practices Summary
✅ DO:
- Start with
git_diff()or the primary change evidence - Use
project_docs(doc_type="context")as a compact conventions snapshot - Provide clear tool descriptions
- Guide size-based strategies
- Allow style flexibility
- Be explicit about output format
❌ DON'T:
- Hardcode project-specific details
- Over-constrain markdown structure
- Assume file locations
- Use speculative language
- Ignore relevance scores
Next Steps
- Tools — Building tools that capabilities can use
- Output Validation — Schema validation and error recovery
- Agent System — How capabilities are executed
