Skip to content

Simple Local AutoBuild Setup

Run GuardKit AutoBuild against a local vLLM server instead of the Anthropic API.

Prerequisites

  • NVIDIA GPU with sufficient VRAM (80GB+ for Qwen3-Coder-Next)
  • Docker with NVIDIA Container Toolkit
  • GuardKit installed (pip install guardkit-py[autobuild])

Quick Start

# 1. Start the vLLM server
./scripts/vllm-serve.sh

# 2. Wait for the model to load (3-5 min for 80B)
docker logs -f vllm-qwen3-coder

# 3. Verify the server is ready
curl http://localhost:8000/health
curl http://localhost:8000/v1/models

# 4. Run AutoBuild
ANTHROPIC_BASE_URL=http://localhost:8000 \
ANTHROPIC_API_KEY=vllm-local \
guardkit autobuild task TASK-XXX

Model Alignment

This is the most common cause of local AutoBuild failures.

The vLLM server exposes your local model under an alias (SERVED_MODEL_NAME in scripts/vllm-serve.sh). The Claude Agent SDK's bundled claude CLI sends requests using its own hardcoded default model ID. These two values must match exactly, or every SDK request will return a 404.

Why it matters

When AutoBuild invokes the Player or Coach agent, it uses the Claude Agent SDK which shells out to the bundled claude CLI. That CLI sends requests like:

POST /v1/messages
{ "model": "claude-sonnet-4-6", ... }

vLLM only responds to model names it knows about. If SERVED_MODEL_NAME is set to something different (e.g. claude-sonnet-4-5-20241022), vLLM returns 404 and the agent fails.

How to verify alignment

# 1. Check what vLLM is serving
curl -s http://localhost:8000/v1/models | python3 -m json.tool
# Look for the "id" field — this is the served model name

# 2. Check what the CLI expects
ANTHROPIC_BASE_URL=http://localhost:8000 claude --version
# The default model ID is shown in the output

If they don't match, update SERVED_MODEL_NAME in scripts/vllm-serve.sh and restart the container.

What breaks when they diverge

Symptom Cause
Player agent gets 404 on /v1/messages SERVED_MODEL_NAME doesn't match CLI default
Coach SDK error: "model not found" Same mismatch, hit during coach verification
AutoBuild stalls after "Invoking agent..." Request rejected, retry loop exhausts attempts

Historical examples

  • TASK-REV-AB3D: Player agent failed with 404 because SERVED_MODEL_NAME was set to an older model ID after a SDK upgrade.
  • TASK-REV-ED10: Coach SDK invocation failed with the same root cause, discovered independently.

When to check

Re-verify alignment whenever you: - Upgrade guardkit-py or claude-agent-sdk - Change the model preset in vllm-serve.sh - See unexpected 404 errors in AutoBuild logs

Model Presets

Preset Model VRAM Speed Command
next (default) Qwen3-Coder-Next FP8 ~92GB ~43 tok/s ./scripts/vllm-serve.sh
30b Qwen3-Coder-30B-A3B ~30GB faster ./scripts/vllm-serve.sh 30b
next-nvfp4 Qwen3-Coder-Next NVFP4 ~50GB ~35 tok/s ./scripts/vllm-serve.sh next-nvfp4
custom Any model varies varies ./scripts/vllm-serve.sh custom org/model

Troubleshooting

Server won't start

# Check GPU availability
nvidia-smi

# Check Docker GPU support
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

Out of memory

Reduce GPU utilization or switch to a smaller model:

VLLM_GPU_UTIL=0.6 ./scripts/vllm-serve.sh 30b

Slow generation

Enable prefix caching (already enabled by default) and ensure flashinfer attention backend is supported on your GPU.