Model Optimization Guide¶

Overview¶

This guide provides comprehensive information about the optimized model configuration for all agents in the AI-Engineer system. The configuration follows best practices for balancing performance, cost, and quality across different agent responsibilities.

Model Assignment Strategy¶

Philosophy¶

Agent model assignment is based on task complexity and reasoning requirements, not arbitrary preferences:

Sonnet (Claude 3.5): Complex reasoning, architectural decisions, multi-factor analysis, strategic planning
Haiku (Claude 3.5): High-volume execution, template-based generation, deterministic processes, structured parsing

Key Principles¶

Match Complexity to Capability: Use Sonnet for tasks requiring deep reasoning; use Haiku for structured, predictable tasks
Cost Efficiency: Haiku is 5-10x more cost-effective for appropriate workloads
Performance: Haiku is 3-5x faster for execution-oriented tasks
Quality Consistency: Both models maintain high quality when matched to appropriate tasks

Complete Model Assignment Matrix¶

Sonnet Agents (11 agents) - Complex Reasoning¶

Agent	Model	Primary Responsibility	Rationale
architectural-reviewer	sonnet	SOLID/DRY/YAGNI analysis	Requires deep analysis of design patterns, trade-offs, and principle compliance. Early architectural review saves 40-50% development time.
code-reviewer	sonnet	Quality & compliance review	Demands nuanced judgment on maintainability, security, performance, and requirements compliance. Catches subtle issues affecting long-term quality.
task-manager	sonnet	Workflow orchestration	Involves complex state transitions, quality gate evaluation, multi-agent collaboration, and intelligent decision-making.
security-specialist	sonnet	Security & threat analysis	Requires deep understanding of attack vectors, threat modeling, compliance frameworks, and risk assessment.
database-specialist	sonnet	Data architecture	Complex analysis of query performance, schema design patterns, scaling strategies, and data modeling trade-offs.
devops-specialist	sonnet	Infrastructure strategy	Complex infrastructure decisions, cloud architecture trade-offs, pipeline optimization, and multi-platform deployment planning.
debugging-specialist	sonnet	Root cause analysis	Deep reasoning about system behavior, error patterns, and complex interactions. Evidence-based problem solving.
pattern-advisor	sonnet	Design pattern matching	Sophisticated matching of requirements to design solutions, understanding pattern trade-offs, and evaluating implementation complexity.
complexity-evaluator	sonnet	Complexity scoring	Analyzes multiple factors (file count, patterns, risk, dependencies) and makes nuanced routing decisions for review mode selection.
figma-react-orchestrator	sonnet	6-phase Saga coordination	Sophisticated workflow management with MCP coordination, constraint validation, and visual regression testing.
zeplin-maui-orchestrator	sonnet	6-phase Saga coordination	Sophisticated workflow management with Zeplin MCP coordination, XAML generation, and platform-specific testing.
python-mcp-specialist	sonnet	MCP server development	Deep understanding of protocol specifications, FastMCP patterns, tool registration, and async architecture.

Haiku Agents (5 agents) - Execution & Templates¶

Agent	Model	Primary Responsibility	Rationale
requirements-analyst	haiku	EARS notation extraction	Structured template-based extraction with high predictability. Fast, cost-effective processing for pattern-based requirement formalization.
bdd-generator	haiku	Gherkin scenario generation	Well-defined templates with predictable patterns. Excels at high-volume structured content generation with consistent formatting.
test-verifier	haiku	Test execution & parsing	Test execution and result parsing follow deterministic patterns. Efficiently handles high-volume test runs and quality gate validation.
test-orchestrator	haiku	Test coordination	Test coordination workflow is highly structured with clear decision paths. Efficiently manages test ordering and result aggregation.
build-validator	haiku	Compilation validation	Build validation is deterministic with clear success/failure criteria. Efficiently parses compiler output and categorizes issues.

Model Distribution Analysis¶

Breakdown¶

Sonnet: 11 agents (64.7%)
Haiku: 5 agents (29.4%)
Stack-specific: 1 agent (5.9%) - python-mcp-specialist can delegate to stack-specific agents

Why This Distribution?¶

The 65/35 split toward Sonnet reflects the AI-Engineer system's focus on quality, architectural correctness, and complex decision-making:

Architecture-First Philosophy: Multiple agents (architectural-reviewer, pattern-advisor, complexity-evaluator) ensure design quality before implementation
Multi-Factor Analysis: Security, database, DevOps decisions require deep reasoning about trade-offs
Workflow Orchestration: Complex state management and multi-agent coordination
Strategic Planning: Long-term maintainability prioritized over short-term cost savings

Haiku excels where predictability is high: Requirements extraction, test automation, and build validation follow well-defined patterns.

Cost Impact Analysis¶

Estimated Cost per Task (using `/task-work`)¶

Assuming a typical task with: - 1x architectural review (Sonnet) - 1x implementation planning (Sonnet) - 1x code review (Sonnet) - 2x test runs (Haiku) - 1x build validation (Haiku) - 1x BDD generation (Haiku)

Total tokens per task: ~50K tokens - Sonnet: ~30K tokens (~60%) - Haiku: ~20K tokens (~40%)

Cost comparison: - All Sonnet: $0.45 per task - Optimized Mix: $0.30 per task (33% savings) - All Haiku: $0.06 per task (but significant quality loss on complex tasks)

Annual savings (1000 tasks/year): $150 saved while maintaining quality.

Performance Impact Analysis¶

Execution Time Improvements¶

Phase	Agent	Model	Time Impact
Requirements Gathering	requirements-analyst	haiku	3x faster than Sonnet
BDD Generation	bdd-generator	haiku	4x faster than Sonnet
Build Validation	build-validator	haiku	5x faster than Sonnet
Test Execution	test-verifier	haiku	3x faster than Sonnet
Architectural Review	architectural-reviewer	sonnet	No change (requires depth)
Code Review	code-reviewer	sonnet	No change (requires depth)

Net improvement: 20-30% faster task completion while maintaining quality on critical review phases.

Quality Assurance¶

How We Maintain Quality with Haiku¶

Template-Based Tasks (Haiku excels): - EARS notation has 5 fixed patterns (ubiquitous, event-driven, state-driven, unwanted, optional) - Gherkin follows strict Given-When-Then structure - Test result parsing uses deterministic JSON/XML formats - Build errors follow compiler-specific formats

Complex Reasoning Tasks (Sonnet required): - SOLID principle evaluation requires understanding trade-offs - Security threat modeling requires understanding attack patterns - Database optimization requires analyzing query plans and indexing strategies - Design pattern selection requires matching problems to solutions

Quality Gates¶

All agents (regardless of model) must pass: 1. Output Format Validation: Structured outputs validated against schemas 2. Completeness Checks: All required fields present 3. Consistency Validation: Cross-agent data consistency 4. Human Checkpoint: Optional human review for critical decisions

Migration & Rollback¶

How to Change Agent Models¶

Option 1: Edit Agent Frontmatter Directly

# Edit the agent file
vim installer/core/agents/requirements-analyst.md

# Change model field
model: sonnet  # Change to haiku

# Add rationale
model_rationale: "Your reasoning here"

Option 2: Use Git for Rollback

# Revert specific agent
git checkout HEAD -- installer/core/agents/requirements-analyst.md

# Revert all agents
git checkout HEAD -- installer/core/agents/*.md

# Revert to specific commit
git checkout <commit-hash> -- installer/core/agents/

Rollback Safety¶

No custom backup system needed (YAGNI principle): - Git provides complete version history - Each commit is a snapshot - Easy to revert individual or all agents - No additional infrastructure to maintain

Experimentation Guidelines¶

When to Experiment with Different Models¶

Try Haiku on Sonnet agents when: 1. Cost pressure is extreme 2. Task is highly repetitive with low variance 3. You have strong validation mechanisms 4. Speed is more critical than nuanced analysis

Try Sonnet on Haiku agents when: 1. Quality issues detected in template-based output 2. Requirements have high variability 3. Additional reasoning helps edge case handling 4. Cost is not a primary concern

A/B Testing Approach¶

# Create feature branch
git checkout -b experiment/haiku-code-reviewer

# Update agent model
# Edit installer/core/agents/code-reviewer.md
# model: sonnet → model: haiku

# Run 50 tasks and measure:
# - Quality (defects caught vs escaped)
# - Speed (review time)
# - Cost (API usage)

# Compare with baseline (main branch)
# If quality drops >10%, revert
# If quality maintained and cost drops >30%, merge

Stack-Specific Agent Configuration¶

Implementation Status¶

Completed (as of Nov 2025): - ✅ python-api-specialist (Haiku) - FastAPI, async, Pydantic - ✅ react-state-specialist (Haiku) - Hooks, TanStack Query, Zustand - ✅ dotnet-domain-specialist (Haiku) - DDD, entities, value objects

Discovery System: - AI-powered matching via metadata (stack, phase, capabilities, keywords) - Graceful degradation: works with agents with/without metadata - Fallback: task-manager if no specialist found - Context analysis: detects stack from file extensions, project structure, keywords

Cost Impact: | Scenario | Cost per Task | vs Baseline | vs Current | |----------|---------------|-------------|------------| | All-Sonnet (baseline) | $0.45 | - | +50% | | Current (33% Haiku) | $0.30 | -33% | - | | Target (70% Haiku) | $0.20 | -48% | -33% |

Performance Impact: - Phase 3 speed: 4-5x faster with Haiku (code generation) - Overall task time: 40-50% faster completion - Quality: Maintained at 90%+ via Phase 4.5 test enforcement

Future Expansion: - Go, Rust, Java specialists (as demand grows) - Existing agent migration (optional, via /agent-enhance)

Stack-Specific Implementation Agents¶

Stack-specific agents (e.g., python-api-specialist, react-state-specialist) use Haiku for code generation:

Reasoning: - Code generation follows stack-specific patterns and templates - High-volume output (lots of code generated) - Quality ensured by upstream architectural review (Sonnet) - Cost-effective for repetitive code generation tasks

Example Configuration:

name: python-api-specialist
model: haiku
model_rationale: "FastAPI endpoint generation follows established patterns. Architectural quality ensured by upstream architectural-reviewer (Sonnet). Haiku provides fast, cost-effective code generation."
stack: [python]
phase: implementation
capabilities:
  - FastAPI endpoint implementation
  - Async request handling patterns
  - Pydantic schema generation
keywords: [fastapi, async, endpoints, router, dependency-injection]

When Stack-Specific Agents Need Sonnet¶

Use Sonnet for stack-specific agents when: 1. Novel architecture: Implementing new patterns or paradigms 2. Performance-critical: Optimization requires deep understanding 3. Complex integration: Multiple systems with intricate interactions 4. Security-sensitive: Authentication, authorization, encryption

Best Practices¶

Do's¶

✅ Match model to task complexity: Haiku for execution, Sonnet for reasoning ✅ Document rationale: Always include model_rationale field ✅ Use git for version control: No need for custom backup systems ✅ Measure impact: Track cost, speed, and quality metrics ✅ Validate outputs: Ensure quality gates regardless of model

Don'ts¶

❌ Don't use Haiku for architectural decisions: Requires deep reasoning ❌ Don't use Sonnet for template-based tasks: Wastes cost and time ❌ Don't skip model_rationale: Future maintainers need context ❌ Don't create custom backup systems: Use git (YAGNI) ❌ Don't optimize prematurely: Start with quality, optimize if needed

Monitoring & Metrics¶

Key Metrics to Track¶

Cost Metrics: - Total API cost per task - Cost per agent invocation - Cost trend over time

Performance Metrics: - Task completion time - Agent response time - Bottleneck identification

Quality Metrics: - Defects caught in review - Defects escaped to production - Architectural issue detection rate

Alerting Thresholds¶

Cost Alerts: - Single task cost >$1.00 (investigate high-cost agents) - Monthly cost increase >20% (review usage patterns)

Performance Alerts: - Task completion time >30 minutes (identify bottlenecks) - Agent timeout rate >5% (review model capacity)

Quality Alerts: - Production defects >2 per sprint (review agent effectiveness) - Architectural review rejection rate <10% (too lenient) or >40% (too strict)

Future Optimization Opportunities¶

Potential Improvements¶

Dynamic Model Selection: Route to Haiku/Sonnet based on task attributes
Hybrid Agents: Use Haiku for initial pass, Sonnet for validation
Caching: Cache frequent agent responses for identical inputs
Parallel Execution: Run independent agents concurrently
Streaming Responses: Use Claude streaming for faster perceived performance

When to Revisit This Configuration¶

Quarterly Review Triggers: - New Claude models released (e.g., Opus 4, Sonnet 5) - Cost structure changes significantly - Quality metrics show degradation - New agent types added to system - Technology stack changes (e.g., new languages supported)

Conclusion¶

The optimized model configuration balances quality, cost, and performance across 17 agents:

11 Sonnet agents ensure high-quality architectural decisions, security analysis, and complex reasoning
5 Haiku agents provide fast, cost-effective execution for template-based and deterministic tasks
33% cost savings while maintaining quality on critical review phases
20-30% faster task completion through optimized execution phases

Key Takeaway: Match model complexity to task complexity. Use Sonnet where reasoning matters; use Haiku where patterns are predictable.

Last Updated: 2025-11-25 Configuration Version: 1.1 Total Agents: 20 (11 Sonnet, 8 Haiku, 1 stack-specific)

See Also: Agent Discovery Guide for how specialists are automatically selected.