Skip to content

Model Optimization Guide

Overview

This guide provides comprehensive information about the optimized model configuration for all agents in the AI-Engineer system. The configuration follows best practices for balancing performance, cost, and quality across different agent responsibilities.

Model Assignment Strategy

Philosophy

Agent model assignment is based on task complexity and reasoning requirements, not arbitrary preferences:

  • Sonnet (Claude 3.5): Complex reasoning, architectural decisions, multi-factor analysis, strategic planning
  • Haiku (Claude 3.5): High-volume execution, template-based generation, deterministic processes, structured parsing

Key Principles

  1. Match Complexity to Capability: Use Sonnet for tasks requiring deep reasoning; use Haiku for structured, predictable tasks
  2. Cost Efficiency: Haiku is 5-10x more cost-effective for appropriate workloads
  3. Performance: Haiku is 3-5x faster for execution-oriented tasks
  4. Quality Consistency: Both models maintain high quality when matched to appropriate tasks

Complete Model Assignment Matrix

Sonnet Agents (11 agents) - Complex Reasoning

Agent Model Primary Responsibility Rationale
architectural-reviewer sonnet SOLID/DRY/YAGNI analysis Requires deep analysis of design patterns, trade-offs, and principle compliance. Early architectural review saves 40-50% development time.
code-reviewer sonnet Quality & compliance review Demands nuanced judgment on maintainability, security, performance, and requirements compliance. Catches subtle issues affecting long-term quality.
task-manager sonnet Workflow orchestration Involves complex state transitions, quality gate evaluation, multi-agent collaboration, and intelligent decision-making.
security-specialist sonnet Security & threat analysis Requires deep understanding of attack vectors, threat modeling, compliance frameworks, and risk assessment.
database-specialist sonnet Data architecture Complex analysis of query performance, schema design patterns, scaling strategies, and data modeling trade-offs.
devops-specialist sonnet Infrastructure strategy Complex infrastructure decisions, cloud architecture trade-offs, pipeline optimization, and multi-platform deployment planning.
debugging-specialist sonnet Root cause analysis Deep reasoning about system behavior, error patterns, and complex interactions. Evidence-based problem solving.
pattern-advisor sonnet Design pattern matching Sophisticated matching of requirements to design solutions, understanding pattern trade-offs, and evaluating implementation complexity.
complexity-evaluator sonnet Complexity scoring Analyzes multiple factors (file count, patterns, risk, dependencies) and makes nuanced routing decisions for review mode selection.
figma-react-orchestrator sonnet 6-phase Saga coordination Sophisticated workflow management with MCP coordination, constraint validation, and visual regression testing.
zeplin-maui-orchestrator sonnet 6-phase Saga coordination Sophisticated workflow management with Zeplin MCP coordination, XAML generation, and platform-specific testing.
python-mcp-specialist sonnet MCP server development Deep understanding of protocol specifications, FastMCP patterns, tool registration, and async architecture.

Haiku Agents (5 agents) - Execution & Templates

Agent Model Primary Responsibility Rationale
requirements-analyst haiku EARS notation extraction Structured template-based extraction with high predictability. Fast, cost-effective processing for pattern-based requirement formalization.
bdd-generator haiku Gherkin scenario generation Well-defined templates with predictable patterns. Excels at high-volume structured content generation with consistent formatting.
test-verifier haiku Test execution & parsing Test execution and result parsing follow deterministic patterns. Efficiently handles high-volume test runs and quality gate validation.
test-orchestrator haiku Test coordination Test coordination workflow is highly structured with clear decision paths. Efficiently manages test ordering and result aggregation.
build-validator haiku Compilation validation Build validation is deterministic with clear success/failure criteria. Efficiently parses compiler output and categorizes issues.

Model Distribution Analysis

Breakdown

  • Sonnet: 11 agents (64.7%)
  • Haiku: 5 agents (29.4%)
  • Stack-specific: 1 agent (5.9%) - python-mcp-specialist can delegate to stack-specific agents

Why This Distribution?

The 65/35 split toward Sonnet reflects the AI-Engineer system's focus on quality, architectural correctness, and complex decision-making:

  1. Architecture-First Philosophy: Multiple agents (architectural-reviewer, pattern-advisor, complexity-evaluator) ensure design quality before implementation
  2. Multi-Factor Analysis: Security, database, DevOps decisions require deep reasoning about trade-offs
  3. Workflow Orchestration: Complex state management and multi-agent coordination
  4. Strategic Planning: Long-term maintainability prioritized over short-term cost savings

Haiku excels where predictability is high: Requirements extraction, test automation, and build validation follow well-defined patterns.

Cost Impact Analysis

Estimated Cost per Task (using /task-work)

Assuming a typical task with: - 1x architectural review (Sonnet) - 1x implementation planning (Sonnet) - 1x code review (Sonnet) - 2x test runs (Haiku) - 1x build validation (Haiku) - 1x BDD generation (Haiku)

Total tokens per task: ~50K tokens - Sonnet: ~30K tokens (~60%) - Haiku: ~20K tokens (~40%)

Cost comparison: - All Sonnet: $0.45 per task - Optimized Mix: $0.30 per task (33% savings) - All Haiku: $0.06 per task (but significant quality loss on complex tasks)

Annual savings (1000 tasks/year): $150 saved while maintaining quality.

Performance Impact Analysis

Execution Time Improvements

Phase Agent Model Time Impact
Requirements Gathering requirements-analyst haiku 3x faster than Sonnet
BDD Generation bdd-generator haiku 4x faster than Sonnet
Build Validation build-validator haiku 5x faster than Sonnet
Test Execution test-verifier haiku 3x faster than Sonnet
Architectural Review architectural-reviewer sonnet No change (requires depth)
Code Review code-reviewer sonnet No change (requires depth)

Net improvement: 20-30% faster task completion while maintaining quality on critical review phases.

Quality Assurance

How We Maintain Quality with Haiku

Template-Based Tasks (Haiku excels): - EARS notation has 5 fixed patterns (ubiquitous, event-driven, state-driven, unwanted, optional) - Gherkin follows strict Given-When-Then structure - Test result parsing uses deterministic JSON/XML formats - Build errors follow compiler-specific formats

Complex Reasoning Tasks (Sonnet required): - SOLID principle evaluation requires understanding trade-offs - Security threat modeling requires understanding attack patterns - Database optimization requires analyzing query plans and indexing strategies - Design pattern selection requires matching problems to solutions

Quality Gates

All agents (regardless of model) must pass: 1. Output Format Validation: Structured outputs validated against schemas 2. Completeness Checks: All required fields present 3. Consistency Validation: Cross-agent data consistency 4. Human Checkpoint: Optional human review for critical decisions

Migration & Rollback

How to Change Agent Models

Option 1: Edit Agent Frontmatter Directly

# Edit the agent file
vim installer/core/agents/requirements-analyst.md

# Change model field
model: sonnet  # Change to haiku

# Add rationale
model_rationale: "Your reasoning here"

Option 2: Use Git for Rollback

# Revert specific agent
git checkout HEAD -- installer/core/agents/requirements-analyst.md

# Revert all agents
git checkout HEAD -- installer/core/agents/*.md

# Revert to specific commit
git checkout <commit-hash> -- installer/core/agents/

Rollback Safety

No custom backup system needed (YAGNI principle): - Git provides complete version history - Each commit is a snapshot - Easy to revert individual or all agents - No additional infrastructure to maintain

Experimentation Guidelines

When to Experiment with Different Models

Try Haiku on Sonnet agents when: 1. Cost pressure is extreme 2. Task is highly repetitive with low variance 3. You have strong validation mechanisms 4. Speed is more critical than nuanced analysis

Try Sonnet on Haiku agents when: 1. Quality issues detected in template-based output 2. Requirements have high variability 3. Additional reasoning helps edge case handling 4. Cost is not a primary concern

A/B Testing Approach

# Create feature branch
git checkout -b experiment/haiku-code-reviewer

# Update agent model
# Edit installer/core/agents/code-reviewer.md
# model: sonnet → model: haiku

# Run 50 tasks and measure:
# - Quality (defects caught vs escaped)
# - Speed (review time)
# - Cost (API usage)

# Compare with baseline (main branch)
# If quality drops >10%, revert
# If quality maintained and cost drops >30%, merge

Stack-Specific Agent Configuration

Implementation Status

Completed (as of Nov 2025): - ✅ python-api-specialist (Haiku) - FastAPI, async, Pydantic - ✅ react-state-specialist (Haiku) - Hooks, TanStack Query, Zustand - ✅ dotnet-domain-specialist (Haiku) - DDD, entities, value objects

Discovery System: - AI-powered matching via metadata (stack, phase, capabilities, keywords) - Graceful degradation: works with agents with/without metadata - Fallback: task-manager if no specialist found - Context analysis: detects stack from file extensions, project structure, keywords

Cost Impact: | Scenario | Cost per Task | vs Baseline | vs Current | |----------|---------------|-------------|------------| | All-Sonnet (baseline) | $0.45 | - | +50% | | Current (33% Haiku) | $0.30 | -33% | - | | Target (70% Haiku) | $0.20 | -48% | -33% |

Performance Impact: - Phase 3 speed: 4-5x faster with Haiku (code generation) - Overall task time: 40-50% faster completion - Quality: Maintained at 90%+ via Phase 4.5 test enforcement

Future Expansion: - Go, Rust, Java specialists (as demand grows) - Existing agent migration (optional, via /agent-enhance)

Stack-Specific Implementation Agents

Stack-specific agents (e.g., python-api-specialist, react-state-specialist) use Haiku for code generation:

Reasoning: - Code generation follows stack-specific patterns and templates - High-volume output (lots of code generated) - Quality ensured by upstream architectural review (Sonnet) - Cost-effective for repetitive code generation tasks

Example Configuration:

name: python-api-specialist
model: haiku
model_rationale: "FastAPI endpoint generation follows established patterns. Architectural quality ensured by upstream architectural-reviewer (Sonnet). Haiku provides fast, cost-effective code generation."
stack: [python]
phase: implementation
capabilities:
  - FastAPI endpoint implementation
  - Async request handling patterns
  - Pydantic schema generation
keywords: [fastapi, async, endpoints, router, dependency-injection]

When Stack-Specific Agents Need Sonnet

Use Sonnet for stack-specific agents when: 1. Novel architecture: Implementing new patterns or paradigms 2. Performance-critical: Optimization requires deep understanding 3. Complex integration: Multiple systems with intricate interactions 4. Security-sensitive: Authentication, authorization, encryption

Best Practices

Do's

Match model to task complexity: Haiku for execution, Sonnet for reasoning ✅ Document rationale: Always include model_rationale field ✅ Use git for version control: No need for custom backup systems ✅ Measure impact: Track cost, speed, and quality metrics ✅ Validate outputs: Ensure quality gates regardless of model

Don'ts

Don't use Haiku for architectural decisions: Requires deep reasoning ❌ Don't use Sonnet for template-based tasks: Wastes cost and time ❌ Don't skip model_rationale: Future maintainers need context ❌ Don't create custom backup systems: Use git (YAGNI) ❌ Don't optimize prematurely: Start with quality, optimize if needed

Monitoring & Metrics

Key Metrics to Track

Cost Metrics: - Total API cost per task - Cost per agent invocation - Cost trend over time

Performance Metrics: - Task completion time - Agent response time - Bottleneck identification

Quality Metrics: - Defects caught in review - Defects escaped to production - Architectural issue detection rate

Alerting Thresholds

Cost Alerts: - Single task cost >$1.00 (investigate high-cost agents) - Monthly cost increase >20% (review usage patterns)

Performance Alerts: - Task completion time >30 minutes (identify bottlenecks) - Agent timeout rate >5% (review model capacity)

Quality Alerts: - Production defects >2 per sprint (review agent effectiveness) - Architectural review rejection rate <10% (too lenient) or >40% (too strict)

Future Optimization Opportunities

Potential Improvements

  1. Dynamic Model Selection: Route to Haiku/Sonnet based on task attributes
  2. Hybrid Agents: Use Haiku for initial pass, Sonnet for validation
  3. Caching: Cache frequent agent responses for identical inputs
  4. Parallel Execution: Run independent agents concurrently
  5. Streaming Responses: Use Claude streaming for faster perceived performance

When to Revisit This Configuration

Quarterly Review Triggers: - New Claude models released (e.g., Opus 4, Sonnet 5) - Cost structure changes significantly - Quality metrics show degradation - New agent types added to system - Technology stack changes (e.g., new languages supported)

Conclusion

The optimized model configuration balances quality, cost, and performance across 17 agents:

  • 11 Sonnet agents ensure high-quality architectural decisions, security analysis, and complex reasoning
  • 5 Haiku agents provide fast, cost-effective execution for template-based and deterministic tasks
  • 33% cost savings while maintaining quality on critical review phases
  • 20-30% faster task completion through optimized execution phases

Key Takeaway: Match model complexity to task complexity. Use Sonnet where reasoning matters; use Haiku where patterns are predictable.


Last Updated: 2025-11-25 Configuration Version: 1.1 Total Agents: 20 (11 Sonnet, 8 Haiku, 1 stack-specific)

See Also: Agent Discovery Guide for how specialists are automatically selected.