Model Optimization Guide¶
Overview¶
This guide provides comprehensive information about the optimized model configuration for all agents in the AI-Engineer system. The configuration follows best practices for balancing performance, cost, and quality across different agent responsibilities.
Model Assignment Strategy¶
Philosophy¶
Agent model assignment is based on task complexity and reasoning requirements, not arbitrary preferences:
- Sonnet (Claude 3.5): Complex reasoning, architectural decisions, multi-factor analysis, strategic planning
- Haiku (Claude 3.5): High-volume execution, template-based generation, deterministic processes, structured parsing
Key Principles¶
- Match Complexity to Capability: Use Sonnet for tasks requiring deep reasoning; use Haiku for structured, predictable tasks
- Cost Efficiency: Haiku is 5-10x more cost-effective for appropriate workloads
- Performance: Haiku is 3-5x faster for execution-oriented tasks
- Quality Consistency: Both models maintain high quality when matched to appropriate tasks
Complete Model Assignment Matrix¶
Sonnet Agents (11 agents) - Complex Reasoning¶
| Agent | Model | Primary Responsibility | Rationale |
|---|---|---|---|
| architectural-reviewer | sonnet | SOLID/DRY/YAGNI analysis | Requires deep analysis of design patterns, trade-offs, and principle compliance. Early architectural review saves 40-50% development time. |
| code-reviewer | sonnet | Quality & compliance review | Demands nuanced judgment on maintainability, security, performance, and requirements compliance. Catches subtle issues affecting long-term quality. |
| task-manager | sonnet | Workflow orchestration | Involves complex state transitions, quality gate evaluation, multi-agent collaboration, and intelligent decision-making. |
| security-specialist | sonnet | Security & threat analysis | Requires deep understanding of attack vectors, threat modeling, compliance frameworks, and risk assessment. |
| database-specialist | sonnet | Data architecture | Complex analysis of query performance, schema design patterns, scaling strategies, and data modeling trade-offs. |
| devops-specialist | sonnet | Infrastructure strategy | Complex infrastructure decisions, cloud architecture trade-offs, pipeline optimization, and multi-platform deployment planning. |
| debugging-specialist | sonnet | Root cause analysis | Deep reasoning about system behavior, error patterns, and complex interactions. Evidence-based problem solving. |
| pattern-advisor | sonnet | Design pattern matching | Sophisticated matching of requirements to design solutions, understanding pattern trade-offs, and evaluating implementation complexity. |
| complexity-evaluator | sonnet | Complexity scoring | Analyzes multiple factors (file count, patterns, risk, dependencies) and makes nuanced routing decisions for review mode selection. |
| figma-react-orchestrator | sonnet | 6-phase Saga coordination | Sophisticated workflow management with MCP coordination, constraint validation, and visual regression testing. |
| zeplin-maui-orchestrator | sonnet | 6-phase Saga coordination | Sophisticated workflow management with Zeplin MCP coordination, XAML generation, and platform-specific testing. |
| python-mcp-specialist | sonnet | MCP server development | Deep understanding of protocol specifications, FastMCP patterns, tool registration, and async architecture. |
Haiku Agents (5 agents) - Execution & Templates¶
| Agent | Model | Primary Responsibility | Rationale |
|---|---|---|---|
| requirements-analyst | haiku | EARS notation extraction | Structured template-based extraction with high predictability. Fast, cost-effective processing for pattern-based requirement formalization. |
| bdd-generator | haiku | Gherkin scenario generation | Well-defined templates with predictable patterns. Excels at high-volume structured content generation with consistent formatting. |
| test-verifier | haiku | Test execution & parsing | Test execution and result parsing follow deterministic patterns. Efficiently handles high-volume test runs and quality gate validation. |
| test-orchestrator | haiku | Test coordination | Test coordination workflow is highly structured with clear decision paths. Efficiently manages test ordering and result aggregation. |
| build-validator | haiku | Compilation validation | Build validation is deterministic with clear success/failure criteria. Efficiently parses compiler output and categorizes issues. |
Model Distribution Analysis¶
Breakdown¶
- Sonnet: 11 agents (64.7%)
- Haiku: 5 agents (29.4%)
- Stack-specific: 1 agent (5.9%) - python-mcp-specialist can delegate to stack-specific agents
Why This Distribution?¶
The 65/35 split toward Sonnet reflects the AI-Engineer system's focus on quality, architectural correctness, and complex decision-making:
- Architecture-First Philosophy: Multiple agents (architectural-reviewer, pattern-advisor, complexity-evaluator) ensure design quality before implementation
- Multi-Factor Analysis: Security, database, DevOps decisions require deep reasoning about trade-offs
- Workflow Orchestration: Complex state management and multi-agent coordination
- Strategic Planning: Long-term maintainability prioritized over short-term cost savings
Haiku excels where predictability is high: Requirements extraction, test automation, and build validation follow well-defined patterns.
Cost Impact Analysis¶
Estimated Cost per Task (using /task-work)¶
Assuming a typical task with: - 1x architectural review (Sonnet) - 1x implementation planning (Sonnet) - 1x code review (Sonnet) - 2x test runs (Haiku) - 1x build validation (Haiku) - 1x BDD generation (Haiku)
Total tokens per task: ~50K tokens - Sonnet: ~30K tokens (~60%) - Haiku: ~20K tokens (~40%)
Cost comparison: - All Sonnet: $0.45 per task - Optimized Mix: $0.30 per task (33% savings) - All Haiku: $0.06 per task (but significant quality loss on complex tasks)
Annual savings (1000 tasks/year): $150 saved while maintaining quality.
Performance Impact Analysis¶
Execution Time Improvements¶
| Phase | Agent | Model | Time Impact |
|---|---|---|---|
| Requirements Gathering | requirements-analyst | haiku | 3x faster than Sonnet |
| BDD Generation | bdd-generator | haiku | 4x faster than Sonnet |
| Build Validation | build-validator | haiku | 5x faster than Sonnet |
| Test Execution | test-verifier | haiku | 3x faster than Sonnet |
| Architectural Review | architectural-reviewer | sonnet | No change (requires depth) |
| Code Review | code-reviewer | sonnet | No change (requires depth) |
Net improvement: 20-30% faster task completion while maintaining quality on critical review phases.
Quality Assurance¶
How We Maintain Quality with Haiku¶
Template-Based Tasks (Haiku excels): - EARS notation has 5 fixed patterns (ubiquitous, event-driven, state-driven, unwanted, optional) - Gherkin follows strict Given-When-Then structure - Test result parsing uses deterministic JSON/XML formats - Build errors follow compiler-specific formats
Complex Reasoning Tasks (Sonnet required): - SOLID principle evaluation requires understanding trade-offs - Security threat modeling requires understanding attack patterns - Database optimization requires analyzing query plans and indexing strategies - Design pattern selection requires matching problems to solutions
Quality Gates¶
All agents (regardless of model) must pass: 1. Output Format Validation: Structured outputs validated against schemas 2. Completeness Checks: All required fields present 3. Consistency Validation: Cross-agent data consistency 4. Human Checkpoint: Optional human review for critical decisions
Migration & Rollback¶
How to Change Agent Models¶
Option 1: Edit Agent Frontmatter Directly
# Edit the agent file
vim installer/core/agents/requirements-analyst.md
# Change model field
model: sonnet # Change to haiku
# Add rationale
model_rationale: "Your reasoning here"
Option 2: Use Git for Rollback
# Revert specific agent
git checkout HEAD -- installer/core/agents/requirements-analyst.md
# Revert all agents
git checkout HEAD -- installer/core/agents/*.md
# Revert to specific commit
git checkout <commit-hash> -- installer/core/agents/
Rollback Safety¶
No custom backup system needed (YAGNI principle): - Git provides complete version history - Each commit is a snapshot - Easy to revert individual or all agents - No additional infrastructure to maintain
Experimentation Guidelines¶
When to Experiment with Different Models¶
Try Haiku on Sonnet agents when: 1. Cost pressure is extreme 2. Task is highly repetitive with low variance 3. You have strong validation mechanisms 4. Speed is more critical than nuanced analysis
Try Sonnet on Haiku agents when: 1. Quality issues detected in template-based output 2. Requirements have high variability 3. Additional reasoning helps edge case handling 4. Cost is not a primary concern
A/B Testing Approach¶
# Create feature branch
git checkout -b experiment/haiku-code-reviewer
# Update agent model
# Edit installer/core/agents/code-reviewer.md
# model: sonnet → model: haiku
# Run 50 tasks and measure:
# - Quality (defects caught vs escaped)
# - Speed (review time)
# - Cost (API usage)
# Compare with baseline (main branch)
# If quality drops >10%, revert
# If quality maintained and cost drops >30%, merge
Stack-Specific Agent Configuration¶
Implementation Status¶
Completed (as of Nov 2025): - ✅ python-api-specialist (Haiku) - FastAPI, async, Pydantic - ✅ react-state-specialist (Haiku) - Hooks, TanStack Query, Zustand - ✅ dotnet-domain-specialist (Haiku) - DDD, entities, value objects
Discovery System: - AI-powered matching via metadata (stack, phase, capabilities, keywords) - Graceful degradation: works with agents with/without metadata - Fallback: task-manager if no specialist found - Context analysis: detects stack from file extensions, project structure, keywords
Cost Impact: | Scenario | Cost per Task | vs Baseline | vs Current | |----------|---------------|-------------|------------| | All-Sonnet (baseline) | $0.45 | - | +50% | | Current (33% Haiku) | $0.30 | -33% | - | | Target (70% Haiku) | $0.20 | -48% | -33% |
Performance Impact: - Phase 3 speed: 4-5x faster with Haiku (code generation) - Overall task time: 40-50% faster completion - Quality: Maintained at 90%+ via Phase 4.5 test enforcement
Future Expansion:
- Go, Rust, Java specialists (as demand grows)
- Existing agent migration (optional, via /agent-enhance)
Stack-Specific Implementation Agents¶
Stack-specific agents (e.g., python-api-specialist, react-state-specialist) use Haiku for code generation:
Reasoning: - Code generation follows stack-specific patterns and templates - High-volume output (lots of code generated) - Quality ensured by upstream architectural review (Sonnet) - Cost-effective for repetitive code generation tasks
Example Configuration:
name: python-api-specialist
model: haiku
model_rationale: "FastAPI endpoint generation follows established patterns. Architectural quality ensured by upstream architectural-reviewer (Sonnet). Haiku provides fast, cost-effective code generation."
stack: [python]
phase: implementation
capabilities:
- FastAPI endpoint implementation
- Async request handling patterns
- Pydantic schema generation
keywords: [fastapi, async, endpoints, router, dependency-injection]
When Stack-Specific Agents Need Sonnet¶
Use Sonnet for stack-specific agents when: 1. Novel architecture: Implementing new patterns or paradigms 2. Performance-critical: Optimization requires deep understanding 3. Complex integration: Multiple systems with intricate interactions 4. Security-sensitive: Authentication, authorization, encryption
Best Practices¶
Do's¶
✅ Match model to task complexity: Haiku for execution, Sonnet for reasoning
✅ Document rationale: Always include model_rationale field
✅ Use git for version control: No need for custom backup systems
✅ Measure impact: Track cost, speed, and quality metrics
✅ Validate outputs: Ensure quality gates regardless of model
Don'ts¶
❌ Don't use Haiku for architectural decisions: Requires deep reasoning ❌ Don't use Sonnet for template-based tasks: Wastes cost and time ❌ Don't skip model_rationale: Future maintainers need context ❌ Don't create custom backup systems: Use git (YAGNI) ❌ Don't optimize prematurely: Start with quality, optimize if needed
Monitoring & Metrics¶
Key Metrics to Track¶
Cost Metrics: - Total API cost per task - Cost per agent invocation - Cost trend over time
Performance Metrics: - Task completion time - Agent response time - Bottleneck identification
Quality Metrics: - Defects caught in review - Defects escaped to production - Architectural issue detection rate
Alerting Thresholds¶
Cost Alerts: - Single task cost >$1.00 (investigate high-cost agents) - Monthly cost increase >20% (review usage patterns)
Performance Alerts: - Task completion time >30 minutes (identify bottlenecks) - Agent timeout rate >5% (review model capacity)
Quality Alerts: - Production defects >2 per sprint (review agent effectiveness) - Architectural review rejection rate <10% (too lenient) or >40% (too strict)
Future Optimization Opportunities¶
Potential Improvements¶
- Dynamic Model Selection: Route to Haiku/Sonnet based on task attributes
- Hybrid Agents: Use Haiku for initial pass, Sonnet for validation
- Caching: Cache frequent agent responses for identical inputs
- Parallel Execution: Run independent agents concurrently
- Streaming Responses: Use Claude streaming for faster perceived performance
When to Revisit This Configuration¶
Quarterly Review Triggers: - New Claude models released (e.g., Opus 4, Sonnet 5) - Cost structure changes significantly - Quality metrics show degradation - New agent types added to system - Technology stack changes (e.g., new languages supported)
Conclusion¶
The optimized model configuration balances quality, cost, and performance across 17 agents:
- 11 Sonnet agents ensure high-quality architectural decisions, security analysis, and complex reasoning
- 5 Haiku agents provide fast, cost-effective execution for template-based and deterministic tasks
- 33% cost savings while maintaining quality on critical review phases
- 20-30% faster task completion through optimized execution phases
Key Takeaway: Match model complexity to task complexity. Use Sonnet where reasoning matters; use Haiku where patterns are predictable.
Last Updated: 2025-11-25 Configuration Version: 1.1 Total Agents: 20 (11 Sonnet, 8 Haiku, 1 stack-specific)
See Also: Agent Discovery Guide for how specialists are automatically selected.