AutoBuild Workflow Guide¶

Version: 1.0.0 Last Updated: 2026-01-24 Compatibility: GuardKit v1.0+, Claude Agent SDK v0.1.0+ Document Type: Comprehensive Architecture and Usage Guide

Table of Contents¶

Part 1: OVERVIEW & QUICK START¶

What is AutoBuild?¶

AutoBuild is GuardKit's autonomous task implementation system that uses a Player-Coach adversarial cooperation workflow to generate production-quality code with minimal human intervention.

Core Philosophy¶

AutoBuild operates on the principle of adversarial cooperation - two agents with different roles work together through a dialectical process:

Player Agent: Implements code, writes tests, and produces deliverables
Coach Agent: Validates implementation against acceptance criteria and quality gates

This separation ensures independent verification - the same agent that writes the code cannot approve it.

Why AutoBuild?¶

Traditional `/task-work`	AutoBuild `/feature-build`
Human-driven execution	Autonomous execution
Interactive checkpoints	Automatic approval based on quality gates
Single pass implementation	Iterative improvement (up to N turns)
Manual quality verification	Independent Coach validation
Good for exploratory work	Good for well-defined requirements

When to Use AutoBuild¶

Use AutoBuild when: - Requirements are clear and well-defined - Acceptance criteria can be objectively verified - Standard implementation patterns apply - You want autonomous iteration without manual intervention - Implementing a feature with multiple related tasks

Use manual /task-work instead when: - Requirements are exploratory or unclear - Complex architectural decisions needed - High-risk changes requiring human judgment - Novel or unusual requirements

Key Concepts¶

1. Dialectical Loop¶

┌─────────────────────────────────────────────────────────────┐
│                     DIALECTICAL LOOP                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────┐                    ┌──────────────┐      │
│  │   PLAYER     │                    │    COACH     │      │
│  │   Agent      │───────────────────▶│    Agent     │      │
│  └──────────────┘   Implementation   └──────────────┘      │
│        │            Report                  │               │
│        │                                    │               │
│        │            Feedback                │               │
│        │◀───────────────────────────────────│               │
│        │            or Approval             │               │
│        │                                    │               │
│  Capabilities:                       Capabilities:          │
│  - Full file system access           - Read-only access     │
│  - Code implementation               - Test execution       │
│  - Test creation                     - Quality validation   │
│  - task-work delegation              - Criteria verification│
│                                                             │
└─────────────────────────────────────────────────────────────┘

2. Worktree Isolation¶

All AutoBuild work happens in isolated git worktrees: - Location: .guardkit/worktrees/TASK-XXX/ or .guardkit/worktrees/FEAT-XXX/ - Branch: autobuild/TASK-XXX or autobuild/FEAT-XXX - Isolation: Changes don't affect main branch until manually merged - Preservation: Worktrees are never auto-deleted (human review required)

3. Quality Gate Delegation¶

AutoBuild delegates to /task-work --implement-only rather than implementing directly. This provides: - 100% code reuse with proven task-work quality gates - Stack-specific subagents (python-api-specialist, react-specialist, etc.) - Phase 4.5 test enforcement (auto-fix up to 3 attempts) - Code review by dedicated code-reviewer agent

Quick Start Examples¶

Example 1: Single Task¶

# From Claude Code
/feature-build TASK-AUTH-001

# From shell
guardkit autobuild task TASK-AUTH-001

Example 2: Entire Feature¶

# From Claude Code
/feature-build FEAT-A1B2

# From shell
guardkit autobuild feature FEAT-A1B2

Example 3: With Options¶

# More iterations for complex tasks
guardkit autobuild task TASK-AUTH-001 --max-turns 10

# Verbose output with debug logging
GUARDKIT_LOG_LEVEL=DEBUG guardkit autobuild task TASK-AUTH-001 --verbose

# Resume interrupted execution
guardkit autobuild feature FEAT-A1B2 --resume

Part 2: ARCHITECTURE DEEP-DIVE¶

Player-Coach Adversarial Cooperation¶

The Adversarial Cooperation Pattern¶

Research Foundation: AutoBuild's adversarial cooperation pattern is based on Block AI's "Adversarial Cooperation in Code Synthesis" research (December 2025), which introduces dialectical autocoding - a framework for AI agents to write code autonomously through a structured coach-player feedback loop.

Unlike traditional single-agent systems, AutoBuild uses two distinct agents with different roles and capabilities:

┌─────────────────────────────────────────────────────────────────────────────┐
│ AUTOBUILD ORCHESTRATION FLOW                                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  PreLoopQualityGates (optional)                                            │
│       │                                                                     │
│       ▼                                                                     │
│  task-work --design-only (if enable_pre_loop=True)                         │
│       │                                                                     │
│       ▼ (returns plan, complexity)                                          │
│                                                                             │
│  ═══════════════════════════════════════════════════════════════════════    │
│  ADVERSARIAL LOOP (max_turns iterations)                                    │
│  ═══════════════════════════════════════════════════════════════════════    │
│                                                                             │
│  PLAYER TURN:                                                               │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │ task-work --implement-only --mode=tdd                               │    │
│  │     │                                                               │    │
│  │     ├── Phase 3: Implementation (stack-specific agent)             │    │
│  │     ├── Phase 4: Testing (test-orchestrator)                       │    │
│  │     ├── Phase 4.5: Fix Loop (auto-fix, 3 attempts)                 │    │
│  │     └── Phase 5: Code Review (code-reviewer)                       │    │
│  │                                                                     │    │
│  │ Output: Implementation complete, tests passing, code reviewed       │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│       │                                                                     │
│       ▼                                                                     │
│  COACH TURN:                                                                │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │ CoachValidator                                                      │    │
│  │     │                                                               │    │
│  │     ├── Quality gate profile selection (by task_type)              │    │
│  │     ├── Test result verification                                   │    │
│  │     ├── Coverage threshold check                                   │    │
│  │     ├── Plan audit validation                                      │    │
│  │     └── Decision: APPROVE or FEEDBACK                              │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│       │                                                                     │
│       ▼                                                                     │
│  (repeat until approved or max_turns)                                       │
└─────────────────────────────────────────────────────────────────────────────┘

Why Adversarial Cooperation Works¶

Independent Verification: The agent that writes code cannot approve it
Discard Self-Reports: Coach ignores Player's success claims and verifies independently (key insight from Block research)
Different Capabilities: Player has full access, Coach has read-only (can only run tests)
Iterative Improvement: Feedback loops drive convergence to acceptance criteria
Quality Enforcement: Coach validates against objective quality gates

The Core Insight from Block's Research¶

Block's research identified a critical failure mode in single-agent systems: premature success declaration. When an agent is allowed to assess its own work, it tends to "drift from specs" and "declare success prematurely" through circular verification.

The solution is adversarial cooperation:

"Discard the player's self-report of success. Have the coach perform independent evaluation." — Block AI Research, "Adversarial Cooperation in Code Synthesis"

In GuardKit's implementation: - Player implements and claims completion - Coach re-reads original requirements and verifies independently - Coach ignores what Player says it did - Coach runs tests and validates output directly

Architectural Benefits (from Block Research)¶

Problem	Single Agent	Coach-Player (Adversarial)
Anchoring	Drifts from specs	Requirements anchor every turn
Context Pollution	Accumulates noise	Fresh context per phase
Completion	Open-ended, premature claims	Explicit approval gates
Verification	Circular (self-assessment)	Independent (coach verifies)

Comparison with Ralph Wiggum Loop¶

AutoBuild was influenced by the Ralph Wiggum plugin from Anthropic's Claude Code research. Here's how they compare:

Architecture Comparison¶

Aspect	Ralph Wiggum	AutoBuild Player-Coach
Agent Count	Single (self-referential)	Dual (Player + Coach)
Loop Mechanism	Stop hook blocks exit	Orchestrator-driven loop
Completion Detection	Promise tag exact match (`<promise>COMPLETE</promise>`)	Coach decision (approve/feedback)
Context Preservation	Files + git history	Feedback summary + task state
Quality Gates	Tests embedded in prompt	Delegated to task-work (Phase 4-5.5)
Exit Strategy	Escape hatch with max iterations	Max turns + blocked report

Ralph Wiggum Loop Pattern¶

┌──────────────────────────────────────────────────────────────────────┐
│                    RALPH WIGGUM LOOP PATTERN                          │
│                                                                       │
│   User runs /ralph-loop              Claude works                    │
│        ▼                                  ▼                          │
│   ┌─────────────┐                   ┌─────────────┐                  │
│   │ Initialize  │                   │ Implement   │                  │
│   │ loop state  │──────────────────▶│ + test      │                  │
│   └─────────────┘                   └──────┬──────┘                  │
│                                            │                          │
│                                     Claude tries exit                 │
│                                            │                          │
│   ┌─────────────┐                   ┌──────▼──────┐                  │
│   │ Stop Hook   │◀──────────────────│ stop-hook.sh│                  │
│   │ intercepts  │                   └─────────────┘                  │
│   └──────┬──────┘                                                    │
│          │                                                            │
│   ┌──────▼──────┐     NO        ┌─────────────┐                      │
│   │ Promise     │──────────────▶│ Block exit  │                      │
│   │ fulfilled?  │               │ Inject same │──────┐               │
│   └──────┬──────┘               │ prompt      │      │               │
│          │ YES                  └─────────────┘      │               │
│   ┌──────▼──────┐                     ▲              │               │
│   │ Allow exit  │                     └──────────────┘               │
│   │ Loop done   │               (iteration++)                        │
│   └─────────────┘                                                    │
└──────────────────────────────────────────────────────────────────────┘

Key Differences¶

Ralph Wiggum uses a single agent that iterates on itself: - Same prompt re-injected each iteration - File system preserves previous work - Exit blocked until promise fulfilled

AutoBuild Player-Coach uses dual agents: - Player implements, Coach validates (separation of concerns) - Coach provides specific feedback for next iteration - Quality gates provide objective approval criteria

Key Techniques from Anthropic Research¶

AutoBuild incorporates several techniques identified in the Ralph Wiggum architectural review:

1. Promise-Based Completion (IMPLEMENTED)¶

Concept: Explicit, verifiable completion criteria that agents must satisfy.

Ralph Implementation: <promise>COMPLETE</promise> tag must match exactly.

AutoBuild Implementation: Coach validates against acceptance criteria and quality gate results:

class CoachDecision:
    decision: Literal["approve", "feedback", "blocked"]
    criteria_verified: List[CriterionVerification]
    quality_gates_passed: bool
    evidence: str

2. Escape Hatch Pattern (IMPLEMENTED)¶

Concept: Define explicit fallback behavior when maximum iterations are reached.

Ralph Implementation: Prompt includes instructions for documenting blocking issues after N iterations.

AutoBuild Implementation: - Max turns with structured blocked report - Worktree preserved for debugging - Clear documentation of what was attempted

# When turn >= max_turns - 2 and completion not possible:
blocked_report = {
    "blocking_issues": ["External mock unavailable"],
    "attempts_made": ["Turn 1: HTTP mock", "Turn 2: httpretty"],
    "suggested_alternatives": ["Manual mock server setup"]
}

3. Honesty Verification (IMPLEMENTED)¶

Concept: Prevent false success claims through independent verification.

Ralph Philosophy: "The design enforces intellectual honesty: users cannot fabricate false promises to escape."

Block Research Insight: In ablation studies, when coach feedback was withheld, "the player went 4 rounds of implementations with missing feedback. On each iteration it spontaneously found things to improve, however the final implementation was non-functional." This demonstrates why independent verification is essential.

AutoBuild Implementation: Coach independently verifies Player claims: - Runs tests independently (doesn't trust Player's test results) - Re-reads original requirements (doesn't rely on Player's interpretation) - Cross-references claimed files with actual file system - Validates coverage meets thresholds - Outputs structured verification checklists marking each requirement

4. task-work Delegation (ENHANCED BEYOND RALPH)¶

Concept: Reuse proven implementation infrastructure instead of reimplementing.

AutoBuild Advantage: Unlike Ralph's prompt-only approach, AutoBuild delegates to task-work: - Stack-specific subagents (python-api-specialist, react-specialist, etc.) - Phase 4.5 test enforcement loop (3 auto-fix attempts) - Architectural review (SOLID/DRY/YAGNI scoring) - Code review by dedicated code-reviewer agent

This provides 100% code reuse with the proven task-work quality gate system.

Quality Gate Delegation¶

AutoBuild's Player doesn't implement directly - it delegates to task-work --implement-only:

┌─────────────────────────────────────────────────────────────────────────────┐
│ task-work --implement-only --mode=tdd DELEGATION                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Phase 3: Implementation ──────────────────────────────────────────┐        │
│       │                                                            │        │
│       │  INVOKE Task tool:                                         │ ✅      │
│       │    subagent_type: "{selected_implementation_agent}"        │ SUBAGENT│
│       │    - python-api-specialist                                 │ USAGE   │
│       │    - react-specialist                                      │        │
│       │    - dotnet-api-specialist                                 │        │
│       │    - (or task-manager fallback)                            │        │
│       └────────────────────────────────────────────────────────────┘        │
│       │                                                                     │
│       ▼                                                                     │
│  Phase 4: Testing ─────────────────────────────────────────────────┐        │
│       │                                                            │        │
│       │  INVOKE Task tool:                                         │ ✅      │
│       │    subagent_type: "{selected_testing_agent}"               │ SUBAGENT│
│       │    - test-orchestrator                                     │ USAGE   │
│       │    - qa-tester                                             │        │
│       │                                                            │        │
│       │  Compilation check (mandatory)                             │        │
│       │  Test execution                                            │        │
│       │  Coverage analysis (80%/75% thresholds)                    │        │
│       └────────────────────────────────────────────────────────────┘        │
│       │                                                                     │
│       ▼                                                                     │
│  Phase 4.5: Fix Loop ──────────────────────────────────────────────┐        │
│       │                                                            │        │
│       │  WHILE tests fail AND attempt <= 3:                        │ ✅      │
│       │    Fix compilation errors                                  │ AUTO-FIX│
│       │    Fix test failures                                       │        │
│       │    Re-run tests                                            │        │
│       └────────────────────────────────────────────────────────────┘        │
│       │                                                                     │
│       ▼                                                                     │
│  Phase 5: Code Review ─────────────────────────────────────────────┐        │
│       │                                                            │        │
│       │  INVOKE Task tool:                                         │ ✅      │
│       │    subagent_type: "code-reviewer"                          │ SUBAGENT│
│       │  Quality assessment                                        │ USAGE   │
│       │  Error handling review                                     │        │
│       │  Documentation check                                       │        │
│       └────────────────────────────────────────────────────────────┘        │
└─────────────────────────────────────────────────────────────────────────────┘

Benefits of Delegation¶

Stack-Specific Quality: Python tasks get python-api-specialist, React gets react-specialist
TDD Enforcement: Structural enforcement (RED→GREEN→REFACTOR), not just prompt-based
Quality Gates Included: Phase 4.5 auto-fix, coverage thresholds, code review
Single System to Maintain: All task-work improvements automatically benefit AutoBuild
Agent Discovery: Metadata-based matching, template overrides work

Part 3: USING AUTOBUILD¶

From Claude Code (Slash Command)¶

Basic Usage¶

# Single task
/feature-build TASK-AUTH-001

# Entire feature
/feature-build FEAT-A1B2

With Options¶

# More iterations
/feature-build TASK-AUTH-001 --max-turns 10

# Verbose output
/feature-build TASK-AUTH-001 --verbose

# Resume interrupted session
/feature-build TASK-AUTH-001 --resume

# Use different model
/feature-build TASK-AUTH-001 --model claude-opus-4-5-20251101

Advantages of Claude Code¶

Advantage	Description
Interactive	See real-time progress in your IDE
Integrated	Part of your normal Claude Code workflow
Contextual	Claude Code has full codebase context
Familiar	Same slash command interface as other commands

From Shell (Python CLI)¶

Basic Usage¶

# Single task
guardkit autobuild task TASK-AUTH-001

# Entire feature
guardkit autobuild feature FEAT-A1B2

# Check status
guardkit autobuild status TASK-AUTH-001

With Debug Logging¶

# Debug level shows detailed execution
GUARDKIT_LOG_LEVEL=DEBUG guardkit autobuild task TASK-AUTH-001

# Verbose flag shows turn-by-turn progress
guardkit autobuild task TASK-AUTH-001 --verbose

# Both together for maximum visibility
GUARDKIT_LOG_LEVEL=DEBUG guardkit autobuild task TASK-AUTH-001 --verbose

Real-World Example¶

# Full feature execution with monitoring
GUARDKIT_LOG_LEVEL=DEBUG guardkit autobuild feature FEAT-A96D --max-turns 5

# Output shows:
# - Wave-by-wave execution
# - Task state transitions
# - SDK invocation details
# - Coach validation results
# - Quality gate evaluations

Advantages of Shell CLI¶

Advantage	Description
Scriptable	Can be integrated into CI/CD pipelines
Background Execution	Run in terminal while doing other work
Environment Variables	Fine-grained control via `GUARDKIT_LOG_LEVEL`
Direct SDK Access	Closer to the metal for debugging
Parallel Execution	Run multiple features in different terminals

CLI Reference¶

Command: `guardkit autobuild task`¶

Execute AutoBuild orchestration for a single task.

guardkit autobuild task TASK-XXX [OPTIONS]

Arguments: - TASK_ID: Task identifier (e.g., TASK-AUTH-001)

Options:

Option	Default	Description
`--max-turns N`	5	Maximum adversarial turns
`--model MODEL`	claude-sonnet-4-5-20250929	Claude model to use
`--verbose`	false	Show detailed turn-by-turn output
`--resume`	false	Resume from last saved state
`--mode MODE`	tdd	Development mode: standard, tdd, or bdd
`--sdk-timeout N`	900	SDK timeout in seconds (60-3600)
`--no-pre-loop`	false	Skip design phase (Phases 1.6-2.8)
`--skip-arch-review`	false	Skip architectural review quality gate
`--ablation`	false	Run in ablation mode (no Coach feedback) for testing

Exit Codes: - 0: Success (Coach approved) - 1: Task file not found or SDK not available - 2: Orchestration error - 3: Invalid arguments

Examples:

# Basic execution
guardkit autobuild task TASK-AUTH-001

# Complex task with more iterations
guardkit autobuild task TASK-AUTH-001 --max-turns 10 --verbose

# Use Opus model for higher quality
guardkit autobuild task TASK-AUTH-001 --model claude-opus-4-5-20251101

# Skip design phase for simple bug fixes
guardkit autobuild task TASK-FIX-001 --no-pre-loop

# Extended timeout for large implementations
guardkit autobuild task TASK-REFACTOR-001 --sdk-timeout 1800

# Ablation mode for testing (demonstrates system without Coach feedback)
guardkit autobuild task TASK-AUTH-001 --ablation

Command: `guardkit autobuild feature`¶

Execute AutoBuild for all tasks in a feature with dependency ordering.

guardkit autobuild feature FEAT-XXX [OPTIONS]

Arguments: - FEATURE_ID: Feature identifier (e.g., FEAT-A1B2)

Options:

Option	Default	Description
`--max-turns N`	5	Maximum turns per task
`--stop-on-failure/--no-stop-on-failure`	true	Stop on first task failure
`--resume`	false	Resume from last saved state
`--fresh`	false	Start fresh, ignoring saved state
`--task TASK-ID`	-	Run specific task within feature
`--verbose`	false	Show detailed output
`--sdk-timeout N`	900	SDK timeout in seconds
`--enable-pre-loop/--no-pre-loop`	auto	Enable/disable design phase

Exit Codes: - 0: Success (all tasks completed) - 1: Feature file not found or SDK not available - 2: Orchestration error - 3: Validation error

Examples:

# Execute entire feature
guardkit autobuild feature FEAT-A1B2

# Continue even if tasks fail
guardkit autobuild feature FEAT-A1B2 --no-stop-on-failure

# Run specific task within feature context
guardkit autobuild feature FEAT-A1B2 --task TASK-AUTH-002

# Resume after interruption
guardkit autobuild feature FEAT-A1B2 --resume

# Start fresh (discard previous state)
guardkit autobuild feature FEAT-A1B2 --fresh

Command: `guardkit autobuild status`¶

Show AutoBuild status for a task.

guardkit autobuild status TASK-XXX [OPTIONS]

Options:

Option	Default	Description
`--verbose`	false	Show detailed worktree information

Examples:

# Basic status
guardkit autobuild status TASK-AUTH-001

# Detailed status
guardkit autobuild status TASK-AUTH-001 --verbose

Command: `guardkit autobuild complete`¶

Complete all tasks in a feature and archive it.

guardkit autobuild complete FEAT-XXX [OPTIONS]

Options:

Option	Default	Description
`--dry-run`	false	Simulate without making changes
`--force`	false	Force completion even if tasks incomplete

Examples:

# Normal completion
guardkit autobuild complete FEAT-A1B2

# Preview what would happen
guardkit autobuild complete FEAT-A1B2 --dry-run

# Force complete partial feature
guardkit autobuild complete FEAT-A1B2 --force

Configuration Options¶

Environment Variables¶

Variable	Description	Example
`GUARDKIT_LOG_LEVEL`	Logging verbosity	`DEBUG`, `INFO`, `WARNING`, `ERROR`
`ANTHROPIC_API_KEY`	API key for Claude	(required for SDK)

Task Frontmatter Configuration¶

Configure AutoBuild behavior in task frontmatter:

---
id: TASK-AUTH-001
title: "Implement OAuth2 authentication"
status: backlog
autobuild:
  enabled: true
  max_turns: 5
  mode: tdd
  sdk_timeout: 900
  skip_arch_review: false
---

Feature YAML Configuration¶

Configure feature-level behavior:

# .guardkit/features/FEAT-A1B2.yaml
id: FEAT-A1B2
name: "User Authentication"
autobuild:
  sdk_timeout: 1200
  enable_pre_loop: false

Task Invocation Modes¶

Each task can specify an implementation_mode that determines how the Player executes it:

Direct SDK Mode (`implementation_mode: direct`)¶

Uses direct Claude SDK invocation without full task-work phases. Faster startup for simple tasks.

tasks:
  - id: TASK-001
    name: "Create configuration files"
    implementation_mode: direct

Log signature:

INFO: Routing to direct Player path for TASK-001 (implementation_mode=direct)
INFO: Invoking Player via direct SDK for TASK-001 (turn 1)

Best for: Scaffolding, file creation, simple configuration changes.

task-work Delegation Mode (`implementation_mode: task-work`)¶

Default mode. Delegates to /task-work --implement-only for full quality gate enforcement.

tasks:
  - id: TASK-002
    name: "Implement OAuth provider"
    implementation_mode: task-work  # or omit (default)

Log signature:

INFO: Invoking Player via task-work delegation for TASK-002 (turn 1)
INFO: [TASK-002] Max turns: 50

Best for: Complex implementations, code with multiple acceptance criteria, higher-risk changes.

SDK Max Turns Configuration¶

Both modes use TASK_WORK_SDK_MAX_TURNS (50) to ensure sufficient turns for Claude to complete complex implementations. This shared constant prevents premature task termination regardless of mode.

Reference: See TASK-REV-FDF3 for the fix validation that unified this configuration.

Part 4: ADVANCED TOPICS¶

Feature Orchestration¶

Wave-Based Execution¶

Features execute tasks in waves based on dependencies:

┌─────────────────────────────────────────────────────────────────┐
│                     FEATURE ORCHESTRATION                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  📁 Load Feature File                                           │
│     .guardkit/features/FEAT-XXX.yaml                            │
│                                                                 │
│  📋 Parse Tasks + Dependencies                                  │
│     ├── TASK-001 (complexity: 3, deps: [])                      │
│     ├── TASK-002 (complexity: 5, deps: [TASK-001])              │
│     ├── TASK-003 (complexity: 5, deps: [TASK-001])              │
│     └── TASK-004 (complexity: 4, deps: [TASK-002, TASK-003])    │
│                                                                 │
│  🔀 Execute by Parallel Groups                                  │
│     Wave 1: [TASK-001]           ──► Player-Coach Loop          │
│     Wave 2: [TASK-002, TASK-003] ──► Player-Coach Loop (×2)     │
│     Wave 3: [TASK-004]           ──► Player-Coach Loop          │
│                                                                 │
│  📊 Track Progress                                              │
│     Update FEAT-XXX.yaml status after each task                 │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Shared Worktree¶

Features use a single shared worktree for all tasks: - Location: .guardkit/worktrees/FEAT-XXX/ - All task changes accumulate in the same worktree - Enables tasks to build on each other's work

Feature File Schema¶

id: FEAT-A1B2
name: "User Authentication"
description: "OAuth2 authentication flow"
created: 2025-12-24T10:00:00
status: planned  # planned → in_progress → completed/failed

tasks:
  - id: TASK-001
    name: "Create auth service interface"
    file_path: "tasks/backlog/oauth/TASK-001.md"
    complexity: 3
    dependencies: []
    status: pending
    implementation_mode: direct
    estimated_minutes: 45

  - id: TASK-002
    name: "Implement Google OAuth"
    file_path: "tasks/backlog/oauth/TASK-002.md"
    complexity: 5
    dependencies: [TASK-001]
    status: pending
    implementation_mode: task-work
    estimated_minutes: 90

orchestration:
  parallel_groups:
    - [TASK-001]
    - [TASK-002, TASK-003]
    - [TASK-004]
  estimated_duration_minutes: 285
  recommended_parallel: 2

Pre-Loop Design Phase¶

What is Pre-Loop?¶

Pre-loop runs task-work --design-only before the Player-Coach loop to: - Execute clarification questions (Phase 1.6) - Generate implementation plan (Phase 2) - Run architectural review (Phase 2.5B) - Evaluate complexity (Phase 2.7) - Get human approval if needed (Phase 2.8)

When to Use Pre-Loop¶

Starting AutoBuild?
│
├─► Using feature-build (from /feature-plan)?
│   │
│   └─► Tasks already have detailed specs
│       └─► Pre-loop NOT needed (default: disabled)
│
└─► Using task-build (standalone task)?
    │
    ├─► Task has detailed requirements?
    │   └─► Pre-loop runs by default
    │
    └─► Simple bug fix or documentation?
        └─► Consider --no-pre-loop for speed

Pre-Loop Decision Guide¶

Scenario	Command	Pre-Loop?	Time Impact
Feature from /feature-plan	`guardkit autobuild feature FEAT-XXX`	No	15-25 min/task
Feature needing design	`guardkit autobuild feature FEAT-XXX --enable-pre-loop`	Yes	+60-90 min/task
Standalone task	`guardkit autobuild task TASK-XXX`	Yes	75-105 min total
Simple bug fix	`guardkit autobuild task TASK-XXX --no-pre-loop`	No	15-25 min

Resume and State Management¶

Automatic State Persistence¶

AutoBuild saves state after each turn:

# In task frontmatter
autobuild_state:
  current_turn: 2
  max_turns: 5
  worktree_path: .guardkit/worktrees/TASK-AUTH-001
  started_at: '2025-12-24T10:00:00'
  last_updated: '2025-12-24T10:10:00'
  turns:
    - turn: 1
      decision: feedback
      feedback: "Missing token refresh edge case"
      timestamp: '2025-12-24T10:05:00'
    - turn: 2
      decision: approve
      timestamp: '2025-12-24T10:10:00'

Resume Behavior¶

For Tasks:

# Resume interrupted task
guardkit autobuild task TASK-AUTH-001 --resume

For Features:

# Resume - continues from last task
guardkit autobuild feature FEAT-A1B2 --resume

# Fresh - starts over, ignores saved state
guardkit autobuild feature FEAT-A1B2 --fresh

If neither --resume nor --fresh is specified and incomplete state exists, the CLI prompts:

Incomplete state detected for FEAT-A1B2:
  Tasks completed: 2/5
  Last task: TASK-002 (in_progress)

Options:
  [R]esume - Continue from last task
  [F]resh  - Start over from scratch
  [C]ancel - Exit without changes

Your choice [R/F/C]:

Troubleshooting¶

"Claude Agent SDK not installed"¶

# Install AutoBuild dependencies
pip install guardkit-py[autobuild]
# OR
pip install claude-agent-sdk

"Task not found"¶

# Verify task file exists
ls tasks/backlog/TASK-XXX*.md
ls tasks/in_progress/TASK-XXX*.md

# Check task ID format
guardkit autobuild task TASK-AUTH-001  # Correct
guardkit autobuild task AUTH-001       # Wrong - missing TASK- prefix

"Max turns reached without approval"¶

Review Coach feedback from last turn
Check if requirements are too broad
Consider splitting into smaller tasks
Use --max-turns 10 for complex tasks
Fall back to /task-work for manual implementation

"Worktree already exists"¶

# Clean up existing worktree
guardkit worktree cleanup TASK-XXX

# Or manually
rm -rf .guardkit/worktrees/TASK-XXX
git worktree prune

# Then retry
guardkit autobuild task TASK-XXX

Debug Logging¶

# Full debug output
GUARDKIT_LOG_LEVEL=DEBUG guardkit autobuild task TASK-XXX --verbose

# Log to file
GUARDKIT_LOG_LEVEL=DEBUG guardkit autobuild task TASK-XXX 2>&1 | tee autobuild.log

Common Issues¶

Issue	Cause	Solution
SDK timeout	Task too complex	Increase `--sdk-timeout`
Tests always fail	Test setup issues	Check test infrastructure in worktree
Coach never approves	Acceptance criteria too strict	Review task requirements
Worktree conflicts	Previous run artifacts	Use `--fresh` flag

Ablation Mode¶

What is Ablation Mode?¶

Ablation mode (--ablation) is a testing mode that disables Coach feedback to validate the Block AI research finding that adversarial cooperation is essential for quality code generation.

Purpose: Demonstrate that the Player-only system produces inferior results compared to the full Player-Coach adversarial loop.

How Ablation Mode Works¶

Normal Mode:
┌────────────┐     ┌────────────┐
│   Player   │────▶│   Coach    │
│  Implements│     │  Validates │
└────────────┘     └────────────┘
      ▲                  │
      │    Feedback      │
      └──────────────────┘
   (iterative improvement)

Ablation Mode:
┌────────────┐
│   Player   │ (Coach disabled)
│  Implements│
└────────────┘
      │
      └─▶ Auto-approve (no feedback)

Using Ablation Mode¶

# Run task in ablation mode
guardkit autobuild task TASK-AUTH-001 --ablation

# Compare with normal mode
guardkit autobuild task TASK-AUTH-001  # Normal mode with Coach

Expected Outcomes¶

When running in ablation mode, expect:

Metric	Normal Mode	Ablation Mode
Success Rate	Higher	Lower
Code Quality	Better architecture	More technical debt
Test Coverage	Comprehensive	Incomplete
Iterations	2-5 turns	1 turn (premature success)
Edge Cases	Handled	Missed

When ablation mode is active, you'll see:

================================================================================
⚠️  ABLATION MODE ACTIVE
================================================================================
Coach feedback is DISABLED. This mode is for testing only.
Expected outcomes:
  • Higher failure rate (no feedback loop)
  • Lower code quality (no architectural review)
  • More turns needed (no guidance toward convergence)
This validates Block AI research findings.
================================================================================

Validating Block Research Findings¶

The Block AI research paper "Adversarial Cooperation in Code Synthesis" includes ablation studies showing:

"When coach feedback was withheld, the player went 4 rounds of implementations with missing feedback. On each iteration it spontaneously found things to improve, however the final implementation was non-functional."

Ablation mode allows you to reproduce these findings in GuardKit, demonstrating:

Anchoring Bias: Without Coach feedback, Player drifts from original requirements
Premature Success: Player declares completion despite missing functionality
Circular Verification: Player cannot objectively assess its own work
Context Pollution: Error accumulation without fresh perspective

Comparison Testing¶

To validate adversarial cooperation benefits, run the same task in both modes:

# Normal mode (with Coach)
guardkit autobuild task TASK-TEST-001 --verbose > normal_mode.log

# Ablation mode (no Coach)
guardkit autobuild task TASK-TEST-001 --ablation --verbose > ablation_mode.log

# Compare results
diff normal_mode.log ablation_mode.log

Use Cases¶

Do use ablation mode for: - Validating Block research findings - Demonstrating the value of adversarial cooperation - A/B testing implementation quality - Research and academic analysis

Don't use ablation mode for: - Production code generation - Real feature implementation - Tasks requiring high quality - Critical or security-sensitive code

AutoBuild Workflow Guide¶

Table of Contents¶

Part 1: Overview & Quick Start¶

Part 2: Architecture Deep-Dive¶

Part 3: Using AutoBuild¶

Part 4: Advanced Topics¶

Part 1: OVERVIEW & QUICK START¶

What is AutoBuild?¶

Core Philosophy¶

Why AutoBuild?¶

When to Use AutoBuild¶

Key Concepts¶

1. Dialectical Loop¶

2. Worktree Isolation¶

3. Quality Gate Delegation¶

Quick Start Examples¶

Example 1: Single Task¶

Example 2: Entire Feature¶

Example 3: With Options¶

Part 2: ARCHITECTURE DEEP-DIVE¶

Player-Coach Adversarial Cooperation¶

The Adversarial Cooperation Pattern¶

Why Adversarial Cooperation Works¶

The Core Insight from Block's Research¶

Architectural Benefits (from Block Research)¶

Comparison with Ralph Wiggum Loop¶

Architecture Comparison¶

Ralph Wiggum Loop Pattern¶

Key Differences¶

Key Techniques from Anthropic Research¶

1. Promise-Based Completion (IMPLEMENTED)¶

2. Escape Hatch Pattern (IMPLEMENTED)¶

3. Honesty Verification (IMPLEMENTED)¶

4. task-work Delegation (ENHANCED BEYOND RALPH)¶

Quality Gate Delegation¶

Benefits of Delegation¶

Part 3: USING AUTOBUILD¶

From Claude Code (Slash Command)¶

Basic Usage¶

With Options¶

Advantages of Claude Code¶

From Shell (Python CLI)¶

Basic Usage¶

With Debug Logging¶

Real-World Example¶

Advantages of Shell CLI¶

CLI Reference¶

Command: guardkit autobuild task¶

Command: guardkit autobuild feature¶

Command: guardkit autobuild status¶

Command: guardkit autobuild complete¶

Configuration Options¶

Environment Variables¶

Task Frontmatter Configuration¶

Feature YAML Configuration¶

Task Invocation Modes¶

Direct SDK Mode (implementation_mode: direct)¶

task-work Delegation Mode (implementation_mode: task-work)¶

SDK Max Turns Configuration¶

Part 4: ADVANCED TOPICS¶

Feature Orchestration¶

Wave-Based Execution¶

Shared Worktree¶

Feature File Schema¶

Pre-Loop Design Phase¶

What is Pre-Loop?¶

When to Use Pre-Loop¶

Pre-Loop Decision Guide¶

Resume and State Management¶

Automatic State Persistence¶

Resume Behavior¶

Troubleshooting¶

"Claude Agent SDK not installed"¶

"Task not found"¶

"Max turns reached without approval"¶

"Worktree already exists"¶

Debug Logging¶

Common Issues¶

Ablation Mode¶

What is Ablation Mode?¶

Command: `guardkit autobuild task`¶

Command: `guardkit autobuild feature`¶

Command: `guardkit autobuild status`¶

Command: `guardkit autobuild complete`¶

Direct SDK Mode (`implementation_mode: direct`)¶

task-work Delegation Mode (`implementation_mode: task-work`)¶