AutoBuild Architecture Deep-Dive¶

Version: 1.0.0 Last Updated: 2026-01-24 Audience: Contributors, advanced users, integrators Document Type: Technical Architecture Reference

Table of Contents¶

Architectural Overview
Module Structure
Player-Coach Pattern Implementation
Quality Gate Delegation (Option B)
State Machine and Transitions
Escape Hatch Pattern
Pre-Loop Design Phase
Feature Orchestration Engine
Worktree Management
Integration Points

Architectural Overview¶

AutoBuild implements an adversarial cooperation pattern using the Claude Agent SDK, based on Block AI's "Adversarial Cooperation in Code Synthesis" research (December 2025). This approach, called dialectical autocoding, uses a structured coach-player feedback loop where independent verification prevents the "premature success declaration" failure mode common in single-agent systems.

The architecture separates concerns across three layers:

┌─────────────────────────────────────────────────────────────────────────────┐
│ LAYER 1: CLI INTERFACE                                                       │
│ guardkit/cli/autobuild.py                                                    │
│                                                                              │
│ Commands: task, feature, status, complete                                    │
│ Responsibilities: Argument parsing, environment setup, user interaction     │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ LAYER 2: ORCHESTRATION                                                       │
│ guardkit/orchestrator/                                                       │
│                                                                              │
│ Components:                                                                  │
│ - agent_invoker.py: Claude SDK wrapper, agent creation                      │
│ - quality_gates/: Pre-loop, task-work interface, coach validator           │
│ - features/: Feature YAML parsing, wave management                          │
│                                                                              │
│ Responsibilities: Workflow coordination, state management, iteration        │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ LAYER 3: EXECUTION                                                           │
│ Claude Agent SDK                                                             │
│                                                                              │
│ Agents: Player (full tools), Coach (read-only)                              │
│ Responsibilities: LLM interaction, tool execution, file operations          │
└─────────────────────────────────────────────────────────────────────────────┘

Design Principles¶

Separation of Concerns: Player implements, Coach validates, neither can do both
Independent Verification: Coach ignores Player's self-reports and verifies directly (key Block research insight)
Tool Asymmetry: Different tool permissions enforce role boundaries
Delegation over Duplication: Reuse task-work quality gates (100% code reuse)
Isolation: Git worktrees protect main branch from experimental changes
Escape Hatch: Max iterations prevent infinite loops

Module Structure¶

guardkit/
├── cli/
│   ├── main.py                    # CLI entry point, command groups
│   └── autobuild.py               # AutoBuild commands (task, feature, status, complete)
│
├── orchestrator/
│   ├── agent_invoker.py           # Claude SDK wrapper, agent lifecycle
│   │
│   ├── quality_gates/
│   │   ├── pre_loop.py            # Pre-loop design phases (optional)
│   │   ├── task_work_interface.py # task-work delegation, profile selection
│   │   └── coach_validator.py     # Quality gate evaluation (replacing LLM Coach)
│   │
│   └── features/
│       ├── feature_parser.py      # Feature YAML parsing
│       ├── wave_manager.py        # Wave-based task orchestration
│       └── state_tracker.py       # Feature/task state persistence
│
└── models/
    ├── task.py                    # Task data model
    ├── feature.py                 # Feature data model
    └── quality_report.py          # Quality gate results model

Key Files¶

File	Responsibility
autobuild.py	CLI commands, argument parsing
agent_invoker.py	SDK integration, agent creation
pre_loop.py	Design phase delegation
task_work_interface.py	Quality gate delegation

Player-Coach Pattern Implementation¶

Role Definition¶

# Player Agent Configuration
PLAYER_TOOLS = [
    "Bash",           # Full command execution
    "Read",           # File reading
    "Write",          # File creation
    "Edit",           # File modification
    "Glob",           # File pattern matching
    "Grep",           # Content search
    "TodoWrite",      # Task tracking
]

# Coach Agent Configuration (Read-Only)
COACH_TOOLS = [
    "Bash",           # Read-only (test execution only)
    "Read",           # File reading
    "Glob",           # File pattern matching
    "Grep",           # Content search
]

Dialectical Loop Implementation¶

def run_adversarial_loop(
    task_id: str,
    max_turns: int = 5,
    enable_pre_loop: bool = False
) -> AutoBuildResult:
    """
    Execute Player-Coach adversarial loop.

    Flow:
    1. Optional pre-loop design phase
    2. For each turn (up to max_turns):
       a. Player implements (task-work --implement-only)
       b. Coach validates (quality gate check)
       c. If approved, break loop
       d. If feedback, Player continues
    3. Return result (approved or blocked)
    """

    # Pre-loop design phase (optional)
    if enable_pre_loop:
        design_result = pre_loop.run_design_phases(task_id)
        if not design_result.success:
            return AutoBuildResult.blocked(design_result.error)

    # Adversarial loop
    for turn in range(1, max_turns + 1):
        # Player turn: Implement
        player_result = invoke_player(task_id, previous_feedback)

        # Coach turn: Validate
        coach_result = invoke_coach(task_id, player_result)

        if coach_result.approved:
            return AutoBuildResult.approved(turn, player_result)

        previous_feedback = coach_result.feedback

    # Max turns reached - Escape Hatch
    return AutoBuildResult.blocked_max_turns(max_turns)

Agent Invocation¶

The agent_invoker.py module wraps the Claude Agent SDK:

from claude_agent_sdk import Agent, create_agent

class AgentInvoker:
    """Wrapper for Claude Agent SDK with GuardKit-specific configuration."""

    def __init__(self, sdk_timeout: int = 300):
        self.sdk_timeout = sdk_timeout

    def create_player_agent(self, worktree_path: Path) -> Agent:
        """Create Player agent with full tool access."""
        return create_agent(
            name="autobuild-player",
            tools=PLAYER_TOOLS,
            working_directory=str(worktree_path),
            timeout=self.sdk_timeout,
            system_prompt=PLAYER_SYSTEM_PROMPT,
        )

    def create_coach_agent(self, worktree_path: Path) -> Agent:
        """Create Coach agent with read-only tool access."""
        return create_agent(
            name="autobuild-coach",
            tools=COACH_TOOLS,
            working_directory=str(worktree_path),
            timeout=self.sdk_timeout,
            system_prompt=COACH_SYSTEM_PROMPT,
        )

Quality Gate Delegation (Option B)¶

AutoBuild implements Option B from the Ralph Wiggum architectural review: delegation to task-work rather than reimplementing quality gates.

Why Delegation?¶

Approach	Code Reuse	Consistency	Maintenance
Option A (Reimplement)	0%	Risk of drift	Double maintenance
Option B (Delegate)	100%	Guaranteed	Single codebase

Delegation Flow¶

┌─────────────────────────────────────────────────────────────────────────────┐
│ PLAYER AGENT                                                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  1. Receive task with acceptance criteria                                    │
│  2. Invoke: /task-work TASK-XXX --implement-only --mode=tdd                 │
│                                                                              │
│     ┌─────────────────────────────────────────────────────────────────┐     │
│     │ TASK-WORK EXECUTION (within Player's SDK session)               │     │
│     │                                                                 │     │
│     │ Phase 3: Implementation                                         │     │
│     │   └── Stack-specific agent (python-api-specialist, etc.)       │     │
│     │                                                                 │     │
│     │ Phase 4: Testing                                                │     │
│     │   └── Test orchestrator runs pytest/vitest/etc.                │     │
│     │                                                                 │     │
│     │ Phase 4.5: Test Enforcement Loop                                │     │
│     │   └── Auto-fix (up to 3 attempts), then fail                   │     │
│     │                                                                 │     │
│     │ Phase 5: Code Review                                            │     │
│     │   └── code-reviewer agent (SOLID/DRY/YAGNI)                    │     │
│     └─────────────────────────────────────────────────────────────────┘     │
│                                                                              │
│  3. Report results to Coach                                                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Task-Work Interface¶

The task_work_interface.py module selects appropriate quality gate profiles:

# Quality Gate Profiles by task_type
QUALITY_PROFILES = {
    "scaffolding": {
        "coverage_threshold": 0,      # No coverage for scaffolding
        "require_tests": False,
        "phases": [3],                # Implementation only
    },
    "feature": {
        "coverage_threshold": 80,
        "require_tests": True,
        "phases": [3, 4, 4.5, 5],    # Full implementation + testing
    },
    "testing": {
        "coverage_threshold": 80,
        "require_tests": True,
        "phases": [4, 4.5],          # Testing phases only
    },
    "documentation": {
        "coverage_threshold": 0,
        "require_tests": False,
        "phases": [3],               # Implementation only
    },
}

def get_profile(task_type: str) -> dict:
    """Select quality gate profile based on task type."""
    return QUALITY_PROFILES.get(task_type, QUALITY_PROFILES["feature"])

State Machine and Transitions¶

Task States¶

┌────────────┐     ┌─────────────────┐     ┌───────────────┐     ┌───────────┐
│  BACKLOG   │────▶│  IN_PROGRESS    │────▶│   IN_REVIEW   │────▶│ COMPLETED │
└────────────┘     └─────────────────┘     └───────────────┘     └───────────┘
                          │                        │
                          │                        │
                          ▼                        ▼
                   ┌────────────┐          ┌────────────┐
                   │  BLOCKED   │          │  BLOCKED   │
                   └────────────┘          └────────────┘

AutoBuild-Specific States¶

class AutoBuildState(Enum):
    """AutoBuild execution states."""

    PENDING = "pending"           # Task queued, not started
    PRE_LOOP = "pre_loop"         # Running design phases
    PLAYER_TURN = "player_turn"   # Player implementing
    COACH_TURN = "coach_turn"     # Coach validating
    APPROVED = "approved"         # Coach approved, ready for review
    BLOCKED = "blocked"           # Max turns or quality gate failure
    COMPLETED = "completed"       # Merged to main

State Persistence¶

State is persisted to .guardkit/state/ for resume capability:

{
  "task_id": "TASK-AUTH-001",
  "feature_id": "FEAT-A1B2",
  "state": "player_turn",
  "turn": 2,
  "max_turns": 5,
  "worktree_path": ".guardkit/worktrees/FEAT-A1B2",
  "branch": "autobuild/FEAT-A1B2",
  "started_at": "2026-01-24T10:30:00Z",
  "last_updated": "2026-01-24T10:45:00Z",
  "player_results": [...],
  "coach_feedback": [...]
}

Escape Hatch Pattern¶

The Escape Hatch Pattern prevents infinite loops by enforcing a maximum iteration count.

Implementation¶

class EscapeHatch:
    """
    Escape Hatch Pattern implementation.

    From Anthropic research: "Escape hatches provide a controlled way
    to exit when the AI gets stuck, rather than looping forever or
    requiring manual intervention."
    """

    def __init__(self, max_turns: int = 5):
        self.max_turns = max_turns
        self.current_turn = 0

    def check(self) -> bool:
        """Check if escape hatch should trigger."""
        return self.current_turn >= self.max_turns

    def on_escape(self, task_id: str, results: list) -> AutoBuildResult:
        """
        Generate blocked report when escape hatch triggers.

        The blocked report provides:
        1. Summary of all attempts
        2. Recurring issues identified
        3. Suggested manual interventions
        """
        return AutoBuildResult(
            status="blocked",
            reason="max_turns_reached",
            turns_completed=self.current_turn,
            blocked_report=self._generate_blocked_report(results),
        )

    def _generate_blocked_report(self, results: list) -> dict:
        """Analyze attempts and generate actionable report."""
        return {
            "total_attempts": len(results),
            "recurring_issues": self._identify_recurring_issues(results),
            "last_feedback": results[-1].coach_feedback if results else None,
            "suggested_actions": self._suggest_interventions(results),
        }

Blocked Report Example¶

blocked_report:
  total_attempts: 5
  recurring_issues:
    - "Test coverage below 80% (consistently ~65%)"
    - "Missing edge case tests for error handling"
  last_feedback:
    - "Add tests for AuthService.refresh_token error paths"
    - "Coverage at 67%, needs 80%"
  suggested_actions:
    - "Manually add tests for edge cases"
    - "Review test coverage report at coverage/index.html"
    - "Consider simplifying AuthService implementation"

Pre-Loop Design Phase¶

The pre-loop phase runs design phases (2-2.8) before the adversarial loop when tasks need upfront planning.

When Pre-Loop Runs¶

Scenario	Pre-Loop Default	Override
`feature-build` from `/feature-plan`	Disabled	`--enable-pre-loop`
`feature-build` with minimal specs	Disabled	`--enable-pre-loop`
`task-build` standalone	Enabled	`--no-pre-loop`

Pre-Loop Implementation¶

class PreLoopQualityGates:
    """
    Pre-loop design phases (2-2.8).

    Delegates to task-work --design-only for:
    - Phase 2: Implementation Planning
    - Phase 2.5A: Pattern Suggestion
    - Phase 2.5B: Architectural Review
    - Phase 2.7: Complexity Evaluation
    - Phase 2.8: Human Checkpoint (auto-approved)
    """

    def run(self, task_id: str) -> PreLoopResult:
        """Execute pre-loop design phases."""

        # Delegate to task-work --design-only
        result = invoke_task_work(
            task_id,
            flags=["--design-only"],
            auto_approve_checkpoint=True,
        )

        if not result.success:
            return PreLoopResult.failed(result.error)

        return PreLoopResult(
            plan=result.plan,
            complexity=result.complexity,
            patterns=result.patterns,
        )

Feature Orchestration Engine¶

For features with multiple tasks, the orchestration engine manages wave-based execution.

Wave Concept¶

Tasks within a feature are organized into waves based on dependencies:

# .guardkit/features/FEAT-A1B2.yaml
feature:
  id: FEAT-A1B2
  name: User Authentication

tasks:
  - id: TASK-AUTH-001
    wave: 1
    parallel_group: "wave-1"
    dependencies: []

  - id: TASK-AUTH-002
    wave: 1
    parallel_group: "wave-1"
    dependencies: []

  - id: TASK-AUTH-003
    wave: 2
    parallel_group: "wave-2"
    dependencies: [TASK-AUTH-001, TASK-AUTH-002]

Wave Manager Implementation¶

class WaveManager:
    """
    Manages wave-based task orchestration.

    Waves execute sequentially; tasks within a wave
    can execute in parallel (via Conductor worktrees).
    """

    def __init__(self, feature: Feature):
        self.feature = feature
        self.waves = self._organize_waves(feature.tasks)

    def _organize_waves(self, tasks: list) -> dict[int, list]:
        """Group tasks by wave number."""
        waves = {}
        for task in tasks:
            wave = task.wave or 1
            waves.setdefault(wave, []).append(task)
        return waves

    def get_next_wave(self) -> list | None:
        """Get tasks for next incomplete wave."""
        for wave_num in sorted(self.waves.keys()):
            tasks = self.waves[wave_num]
            if not all(t.status == "completed" for t in tasks):
                return tasks
        return None

    def execute_wave(self, wave_tasks: list) -> WaveResult:
        """
        Execute all tasks in a wave.

        In Conductor mode: Tasks run in parallel
        In standard mode: Tasks run sequentially
        """
        results = []
        for task in wave_tasks:
            result = run_adversarial_loop(task.id)
            results.append(result)

            # If any task blocks, stop wave
            if result.status == "blocked":
                return WaveResult.partial(results)

        return WaveResult.complete(results)

Worktree Management¶

All AutoBuild work occurs in isolated git worktrees.

Worktree Lifecycle¶

1. CREATE
   ├── Create branch: autobuild/TASK-XXX
   ├── Create worktree: .guardkit/worktrees/TASK-XXX
   └── Copy virtual environment (if exists)

2. EXECUTE
   ├── Player/Coach loop in worktree
   ├── All changes committed to worktree branch
   └── State persisted to .guardkit/state/

3. REVIEW (Human)
   ├── Worktree preserved after completion
   ├── Human reviews: git diff main
   └── Human decides: merge or discard

4. CLEANUP (Optional)
   ├── guardkit autobuild complete TASK-XXX
   ├── Merges to main (if approved)
   └── Removes worktree

Worktree Implementation¶

class WorktreeManager:
    """Manages git worktrees for AutoBuild isolation."""

    def __init__(self, base_path: Path):
        self.base_path = base_path
        self.worktrees_dir = base_path / ".guardkit" / "worktrees"

    def create_worktree(self, task_id: str) -> Path:
        """Create isolated worktree for task."""
        branch_name = f"autobuild/{task_id}"
        worktree_path = self.worktrees_dir / task_id

        # Create branch from current HEAD
        subprocess.run([
            "git", "branch", branch_name
        ], check=True)

        # Create worktree
        subprocess.run([
            "git", "worktree", "add",
            str(worktree_path),
            branch_name
        ], check=True)

        return worktree_path

    def cleanup_worktree(self, task_id: str, merge: bool = True):
        """Remove worktree, optionally merging changes."""
        worktree_path = self.worktrees_dir / task_id
        branch_name = f"autobuild/{task_id}"

        if merge:
            # Merge to main
            subprocess.run([
                "git", "checkout", "main"
            ], check=True)
            subprocess.run([
                "git", "merge", branch_name
            ], check=True)

        # Remove worktree
        subprocess.run([
            "git", "worktree", "remove", str(worktree_path)
        ], check=True)

        # Delete branch
        subprocess.run([
            "git", "branch", "-d" if merge else "-D", branch_name
        ], check=True)

Integration Points¶

Claude Agent SDK¶

# Required: pip install guardkit-py[autobuild]
from claude_agent_sdk import Agent, create_agent, run_agent

# Agent creation with GuardKit configuration
agent = create_agent(
    name="autobuild-player",
    tools=PLAYER_TOOLS,
    working_directory=str(worktree_path),
    timeout=sdk_timeout,
    system_prompt=PLAYER_SYSTEM_PROMPT,
    model="claude-sonnet-4-20250514",  # Default model
)

# Execute agent with prompt
result = run_agent(agent, prompt)

Task-Work Command¶

# Player delegates to task-work
/task-work TASK-XXX --implement-only --mode=tdd

# Pre-loop delegates to task-work
/task-work TASK-XXX --design-only

Conductor Integration¶

# Conductor creates worktrees automatically
conductor workspace TASK-XXX

# GuardKit detects Conductor and uses existing worktree
guardkit autobuild task TASK-XXX

Configuration¶

Environment Variables¶

Variable	Default	Description
`GUARDKIT_LOG_LEVEL`	`INFO`	Logging level (DEBUG, INFO, WARNING, ERROR)
`GUARDKIT_SDK_TIMEOUT`	`300`	Claude SDK timeout in seconds
`GUARDKIT_MAX_TURNS`	`5`	Default max iterations for adversarial loop

CLI Flags¶

Flag	Type	Default	Description
`--max-turns`	int	5	Maximum Player-Coach iterations
`--sdk-timeout`	int	300	Claude SDK operation timeout
`--verbose`	bool	false	Enable detailed output
`--resume`	bool	false	Resume interrupted execution
`--enable-pre-loop`	bool	false	Enable pre-loop design phases
`--no-pre-loop`	bool	false	Disable pre-loop for task-build
`--mode`	str	"tdd"	Development mode (tdd, standard)