Job-Specific Context Retrieval¶
What is Job-Specific Context?
Job-specific context retrieval dynamically provides each task with precisely the knowledge it needs - not everything, not nothing, but exactly relevant context. This prevents wasted tokens and ensures Claude has the right information at the right time.
Table of Contents¶
- Overview
- How It Works
- Context Categories
- AutoBuild Additional Context
- Budget Allocation
- Budget Adjustments
- Relevance Filtering
- Performance
- Context in Action
- Troubleshooting
- See Also
Overview¶
The Problem¶
Traditional approaches to context loading have significant drawbacks:
| Approach | Problem |
|---|---|
| Load everything | Wastes tokens, dilutes relevance, hits context limits |
| Load nothing | Claude lacks project understanding, makes generic responses |
| Load by file paths | Too rigid, misses semantic relationships |
The Solution¶
Job-specific context retrieval analyzes each task's characteristics and dynamically allocates a context budget across categories. The result: Claude gets exactly the knowledge needed to succeed.
Key Benefits: - Precision: Context matched to task type, complexity, and phase - Efficiency: Token budget respected, no wasted context - Learning: Refinement attempts get more warning/failure context - AutoBuild-aware: Player/Coach boundaries, turn history, quality gates
How It Works¶
The system follows a 5-step pipeline for every task:
┌─────────────────────────────────────────────────────────────────────────┐
│ Job-Specific Context Retrieval │
│ │
│ Input: Task + Phase + History │
│ ↓ │
│ 1. Task Analysis │
│ - Analyze type, complexity, novelty, AutoBuild context │
│ ↓ │
│ 2. Budget Calculation │
│ - Dynamically allocate token budget (2000-6000+ tokens) │
│ ↓ │
│ 3. Context Retrieval │
│ - Query Graphiti for relevant knowledge across categories │
│ ↓ │
│ 4. Smart Filtering │
│ - Apply relevance thresholds and deduplication │
│ ↓ │
│ 5. Prompt Injection │
│ - Format context for optimal Claude understanding │
│ │
│ Output: Precisely relevant context string │
└─────────────────────────────────────────────────────────────────────────┘
Step 1: Task Analysis¶
The TaskAnalyzer examines:
- Task type: Implementation, review, planning, refinement, documentation
- Complexity: 1-10 scale (determines base budget)
- Novelty: Is this the first task of this type? How many similar tasks exist?
- Refinement status: Is this a retry? What failed before?
- AutoBuild context: Turn number, current actor (Player/Coach), turn history
Step 2: Budget Calculation¶
The DynamicBudgetCalculator determines:
- Total tokens: Based on complexity and adjustments
- Allocation percentages: How much budget for each category
- Priority weights: Which categories to emphasize
Step 3: Context Retrieval¶
The JobContextRetriever queries Graphiti for:
- Feature context (if task belongs to a feature)
- Similar outcomes from past tasks
- Relevant patterns from the codebase
- Architecture context for system understanding
- Warnings from past failures
- Domain knowledge for terminology
Step 4: Smart Filtering¶
Each result is filtered by:
- Relevance score: Below threshold results are discarded
- Deduplication: Redundant information removed
- Budget trimming: Results trimmed to fit allocation
Step 5: Prompt Injection¶
Context is formatted as structured sections with:
- Clear headings per category
- Actionable framing (what to do, what to avoid)
- Budget usage reporting
Context Categories¶
Standard Categories¶
| Category | Description | When Emphasized |
|---|---|---|
| Feature Context | Requirements and success criteria for parent feature | Tasks with feature_id |
| Similar Outcomes | What worked for similar tasks (patterns, approaches) | Testing phase, all tasks |
| Relevant Patterns | Codebase patterns that apply to this task | Implementation phase |
| Architecture Context | How this fits into the overall system | First-of-type, planning |
| Warnings | Approaches to avoid based on past failures | Refinement attempts |
| Domain Knowledge | Domain-specific terminology and concepts | All tasks (lower priority) |
Category Details¶
Feature Context
- Loaded when task has a parent feature_id
- Contains requirements, acceptance criteria, success metrics
- Helps maintain feature-level coherence
Similar Outcomes - Patterns and approaches that succeeded in similar work - Especially valuable for implementation and testing - Filtered by tech stack for relevance
Relevant Patterns
- Codebase-specific patterns (e.g., error handling, API design)
- Loaded from patterns_{tech_stack} and generic patterns groups
- Guides implementation to match existing code style
Architecture Context - High-level system understanding - Where this component fits in the architecture - Emphasized for novel task types
Warnings - Failed approaches from past tasks - Critical for refinement attempts - Framed as "do NOT do this" guidance
Domain Knowledge - Business terminology and concepts - Domain-specific rules and constraints - Lower priority but ensures consistency
AutoBuild Additional Context¶
During /feature-build workflows, additional context categories are loaded to support the Player-Coach adversarial workflow.
AutoBuild-Specific Categories¶
| Category | Description | Purpose |
|---|---|---|
| Role Constraints | Player/Coach boundaries | Prevent role reversal |
| Quality Gate Configs | Task-type specific thresholds | Prevent threshold drift |
| Turn States | Previous turn context | Enable cross-turn learning |
| Implementation Modes | Direct vs task-work guidance | Clarify execution patterns |
Role Constraints¶
Defines what each actor can and cannot do:
Player:
Must do:
- Write code
- Run tests
- Fix issues
Must NOT do:
- Approve own work
- Skip tests
- Modify quality gates
Ask before:
- Schema changes
- Auth/security changes
- Deployment configs
Coach:
Must do:
- Validate against criteria
- Provide specific feedback
- Make approval decisions
Must NOT do:
- Write implementation code
- Run commands
- Modify Player's work directly
Quality Gate Configs¶
Thresholds loaded per task type:
Feature tasks:
- Coverage: ≥80%
- Arch review: ≥60
- Tests required: Yes
Bug fixes:
- Coverage: ≥75%
- Arch review: ≥50
- Tests required: Yes
Documentation:
- Coverage: N/A
- Arch review: N/A
- Tests required: No
Turn States¶
Previous turn history for cross-turn learning:
Turn 1: FEEDBACK
Progress: Initial implementation, missing tests
Turn 2: REJECTED
Progress: Added tests, coverage at 65%
Feedback: "Coverage must be ≥80%. Missing tests for error paths."
Turn 3: (current)
Loaded context includes turns 1-2 to avoid repeating mistakes
Implementation Modes¶
Clarifies where files are created:
task-work mode:
Results in: worktree directory
State via: JSON checkpoints
Pitfalls: Don't expect files in main repo during execution
direct mode:
Results in: main repository
State via: Task file updates
Pitfalls: Changes visible immediately, no isolation
Budget Allocation¶
Base Budgets by Complexity¶
| Task Complexity | Base Budget | Typical Use Cases |
|---|---|---|
| Simple (1-3) | 2,000 tokens | Typo fixes, small features, documentation |
| Medium (4-6) | 4,000 tokens | Standard features, moderate refactoring |
| Complex (7-10) | 6,000 tokens | Architecture changes, security features |
Default Allocation (Standard Tasks)¶
Feature Context: 15%
Similar Outcomes: 25%
Relevant Patterns: 20%
Architecture Context: 20%
Warnings: 15%
Domain Knowledge: 5%
AutoBuild Allocation¶
When is_autobuild=True, allocation shifts to include AutoBuild categories:
Feature Context: 10%
Similar Outcomes: 15%
Relevant Patterns: 15%
Architecture Context: 10%
Warnings: 10%
Domain Knowledge: 5%
Role Constraints: 10%
Quality Gate Configs: 10%
Turn States: 10%
Implementation Modes: 5%
Allocation by Task Type¶
Review Tasks:
Relevant Patterns: 30% (what patterns should be used)
Architecture Context: 25% (does it fit the system)
Similar Outcomes: 15%
Others: 30%
Planning Tasks:
Refinement Tasks:
Warnings: 35% (emphasize what went wrong)
Similar Outcomes: 30% (how others fixed similar)
Relevant Patterns: 15%
Others: 20%
Budget Adjustments¶
The base budget is adjusted based on task characteristics:
Adjustment Modifiers¶
| Condition | Adjustment | Rationale |
|---|---|---|
| First-of-type | +30% | Novel tasks need more architecture understanding |
| Few similar tasks (<3) | +15% | Less precedent to draw from |
| Refinement attempt | +20% | Need more context about what failed |
| AutoBuild Turn >1 | +15% | Load previous turn context |
| AutoBuild with history | +10% | Enable cross-turn learning |
Example Budget Calculations¶
Simple first-of-type task:
Medium refinement task:
Complex AutoBuild turn 3:
Base budget: 6,000 tokens
Turn >1: +15% → 6,900 tokens
Has history: +10% → 7,590 tokens
Total: 7,590 tokens
Relevance Filtering¶
Relevance Thresholds¶
Results are filtered by semantic similarity score:
| Task Context | Threshold | Rationale |
|---|---|---|
| Standard tasks | 0.6 | High precision, avoid noise |
| First-of-type | 0.5 | Broader context for novel tasks |
| Refinement | 0.5 | Don't miss failure patterns |
How Thresholds Work¶
Query: "implement user authentication"
Results:
1. "JWT auth pattern" (score: 0.82) → ✓ Included
2. "Password hashing guide" (score: 0.71) → ✓ Included
3. "User model definition" (score: 0.58) → ✗ Below 0.6 threshold
4. "Rate limiting middleware" (score: 0.45) → ✗ Below threshold
Threshold Tuning¶
If context is missing relevant information, you can adjust thresholds:
# In relevance_tuning.py
THRESHOLDS = {
"standard": 0.6, # Decrease to 0.5 for broader results
"first_of_type": 0.5, # Already permissive
"refinement": 0.5 # Already permissive
}
Performance¶
Measured Metrics¶
| Metric | Value | Notes |
|---|---|---|
| Average retrieval time | 600-800ms | Concurrent queries across categories |
| Cache hit rate | ~40% | Repeated context cached at multiple levels |
| Budget utilization | 70-90% | Efficient, rarely exceeds budget |
| Relevance scores | 0.65-0.85 avg | High quality matches |
Performance Optimizations¶
- Concurrent queries: All category queries run in parallel
- Result caching: Graphiti client caches recent queries
- Early termination: Stop retrieval when budget exhausted
- Deduplication: Avoid loading the same fact twice
Monitoring¶
Context retrieval is logged during task execution:
[INFO] Retrieved job-specific context (1850/2000 tokens)
- Similar outcomes: 3 results (0.72 avg relevance)
- Relevant patterns: 2 results (0.81 avg relevance)
- Warnings: 1 result (0.68 relevance)
- Architecture: 2 results (0.75 avg relevance)
Context in Action¶
Standard Task Execution¶
What happens: 1. Task analyzed (type=implementation, complexity=5, novelty=standard) 2. Budget calculated (4000 tokens) 3. Context retrieved: - Similar outcomes (25%) - Relevant patterns (20%) - Architecture context (20%) - Feature context (15%) - Warnings (15%) - Domain knowledge (5%) 4. Context formatted and injected into prompt
AutoBuild Execution¶
Turn 1: - Loads role constraints, quality gates, implementation modes - No turn history yet
Turn 2+: - Loads all Turn 1 context PLUS: - Previous turn states (what was rejected, why) - Adjusted allocation (more turn states, fewer general patterns)
Example Retrieved Context¶
For a medium-complexity authentication task:
## Job-Specific Context
Budget used: 3,200/4,000 tokens
### What Worked for Similar Tasks
*Patterns and approaches that succeeded in similar work*
- **JWT Authentication**: Used refresh tokens with 15min expiry
- **Password Hashing**: Argon2id with memory cost 64MB
- **Session Management**: Redis-backed with sliding expiration
### Recommended Patterns
*Patterns from the codebase that apply here*
- **AuthMiddleware**: See src/middleware/auth.py for pattern
- **TokenService**: Service-based approach, not inline validation
### Architecture Context
*How this fits into the overall system*
- **Auth is a core service**: All other services depend on it
- **API Gateway handles auth**: Services trust internal requests
### Warnings from Past Experience
*Approaches to AVOID based on past failures*
- **DO NOT** store tokens in localStorage (XSS vulnerability)
- **DO NOT** use symmetric keys for JWT (use RS256)
Troubleshooting¶
Common Issues¶
| Issue | Cause | Solution |
|---|---|---|
| Context missing information | Knowledge not seeded | Run guardkit graphiti seed |
| Context irrelevant | Threshold too low | Increase relevance threshold |
| AutoBuild context missing | Metadata incorrect | Verify is_autobuild=True in task |
| Slow retrieval (>2s) | Neo4j performance | Check Neo4j resources, verify network |
Detailed Troubleshooting¶
"Context missing relevant information"
-
Check if knowledge has been seeded:
-
Verify the task description is specific enough:
-
Review what was retrieved:
- Check the
[INFO]logs for retrieval stats - If relevance scores are low, content may not be seeded
"Context contains irrelevant information"
-
Increase relevance threshold:
-
Review seeded knowledge quality:
-
Check task characteristics are correctly classified
"AutoBuild context missing"
-
Verify task metadata:
-
Check role constraints are seeded:
-
Verify turn states are persisted:
"Slow retrieval (>2 seconds)"
-
Check Neo4j/FalkorDB health:
-
Reduce context categories for very simple tasks
-
Verify network latency to graph database
-
Consider increasing cache TTL
See Also¶
- Graphiti Integration Guide - Setup and configuration
- Graphiti Commands Reference - CLI commands
- AutoBuild Workflow - Player-Coach workflow details
- Quality Gates Integration - Threshold configuration
- FEAT-GR-006 Specification - Technical details
Technical Reference¶
Core Components¶
| Component | Location | Purpose |
|---|---|---|
TaskAnalyzer |
guardkit/knowledge/task_analyzer.py |
Analyzes task characteristics |
DynamicBudgetCalculator |
guardkit/knowledge/budget_calculator.py |
Calculates context budgets |
JobContextRetriever |
guardkit/knowledge/job_context_retriever.py |
Retrieves and formats context |
RelevanceTuning |
guardkit/knowledge/relevance_tuning.py |
Configurable thresholds |
Configuration¶
Context retrieval settings can be adjusted in config/graphiti.yaml:
context:
base_budgets:
simple: 2000
medium: 4000
complex: 6000
relevance_thresholds:
standard: 0.6
first_of_type: 0.5
refinement: 0.5
cache_ttl: 300 # seconds
API Integration¶
For programmatic access:
from guardkit.knowledge.job_context_retriever import JobContextRetriever
from guardkit.knowledge.task_analyzer import TaskPhase
retriever = JobContextRetriever()
context = await retriever.retrieve(
task=task_dict,
phase=TaskPhase.IMPLEMENT
)
# Format for prompt injection
prompt_context = context.to_prompt()