Episode Metadata Schema¶
Deep-dive into Graphiti episode metadata schema used by GuardKit for tracking, versioning, and deduplication.
Overview¶
All episodes seeded into Graphiti include a _metadata block that provides:
- Version tracking: Know which seeding version created the episode
- Deduplication: Prevent duplicate episodes across re-seeding
- Provenance tracking: Track where knowledge came from
- Update detection: Detect when content has changed
- Unique identification: Consistent entity IDs for relationships
Metadata Schema¶
Complete Schema¶
{
"_metadata": {
"source": "string", // Source of the episode
"version": "string", // Seeding version (semver)
"created_at": "ISO8601", // UTC timestamp when created
"updated_at": "ISO8601", // UTC timestamp of last update
"source_hash": "string|null", // SHA256 hash of source file
"entity_id": "string" // Unique entity identifier
}
}
Field Descriptions¶
source (required, string)¶
Identifies where the episode originated from.
Values:
- "guardkit_seeding": System context seeded by GuardKit
- "project_seeding": Project-specific knowledge
- "user_interaction": Knowledge from user conversations
- "feature_build": Knowledge from feature-build sessions
- "task_work": Knowledge from task-work sessions
Example:
version (required, string)¶
Seeding version in semantic versioning format (MAJOR.MINOR.PATCH).
Tracks which version of GuardKit created this episode, enabling: - Detection of outdated knowledge - Selective re-seeding by version - Version-specific query filtering
Example:
Version bumping: - MAJOR: Breaking changes to schema or significant knowledge rewrites - MINOR: New knowledge categories or significant additions - PATCH: Bug fixes, clarifications, minor updates
created_at (required, ISO8601 timestamp)¶
UTC timestamp when episode was first created.
Format: ISO 8601 with timezone (always UTC)
Example:
Usage: - Audit trail of when knowledge was added - Age-based queries (e.g., "show recent additions") - Freshness indicators
updated_at (required, ISO8601 timestamp)¶
UTC timestamp of last update to this episode.
Format: ISO 8601 with timezone (always UTC)
Example:
Usage: - Track when knowledge was last refreshed - Detect stale knowledge - Update notification triggers
Note: On creation, updated_at = created_at
source_hash (optional, string or null)¶
SHA256 hash of the source file that generated this episode.
Values: - null: Generated/synthetic content (not file-based) - string: 64-character hex SHA256 hash
Example:
{
"_metadata": {
"source_hash": "a3b5c7d9e1f2a3b5c7d9e1f2a3b5c7d9e1f2a3b5c7d9e1f2a3b5c7d9e1f2a3b5"
}
}
Usage: - Detect when source file has changed - Skip re-seeding if hash matches - Incremental updates (only changed files)
System context episodes: Always null (generated, not file-based)
entity_id (required, string)¶
Unique identifier for this episode entity, used for deduplication.
Format: Snake_case string, typically {category}_{name}
Example:
Generation: - System context: Uses episode name from seeding function - File-based: Uses file path or name - User interactions: Generated UUID or conversation ID
Usage: - Deduplication: Prevent duplicate episodes with same entity_id - Updates: Replace episode with matching entity_id - Relationships: Reference entities consistently across episodes
Example Episodes with Metadata¶
System Context Episode¶
{
"entity_type": "command",
"name": "/task-work",
"purpose": "Implement a task through the 5-phase quality gate workflow",
"syntax": "/task-work TASK-XXX [--mode=standard|tdd|bdd]",
"phases_executed": "Phase 1 -> Phase 2 -> Phase 2.5 -> Phase 3 -> Phase 4 -> Phase 5",
"_metadata": {
"source": "guardkit_seeding",
"version": "1.0.0",
"created_at": "2024-01-30T12:00:00+00:00",
"updated_at": "2024-01-30T12:00:00+00:00",
"source_hash": null,
"entity_id": "command_task_work"
}
}
Template Episode¶
{
"entity_type": "template",
"id": "fastapi-python",
"name": "Python FastAPI Backend",
"description": "Production-ready FastAPI template",
"language": "Python",
"frameworks": ["FastAPI", "SQLAlchemy", "Pydantic"],
"_metadata": {
"source": "guardkit_seeding",
"version": "1.0.0",
"created_at": "2024-01-30T12:05:00+00:00",
"updated_at": "2024-01-30T12:05:00+00:00",
"source_hash": null,
"entity_id": "template_fastapi_python"
}
}
Failed Approach Episode¶
{
"entity_type": "failed_approach",
"approach_id": "subprocess_to_cli",
"symptom": "subprocess.CalledProcessError for guardkit task-work",
"root_cause": "guardkit task-work CLI command does not exist",
"fix": "Use SDK query() with slash command in prompt",
"prevention": "Never use subprocess for task-work invocation",
"_metadata": {
"source": "guardkit_seeding",
"version": "1.0.0",
"created_at": "2024-01-30T12:10:00+00:00",
"updated_at": "2024-01-30T12:10:00+00:00",
"source_hash": null,
"entity_id": "failed_approach_subprocess_to_cli"
}
}
Metadata Injection¶
Metadata is automatically injected by the _add_episodes() function in guardkit/knowledge/seeding.py:
async def _add_episodes(client, episodes: list, group_id: str, category_name: str) -> None:
"""Add episodes with automatic metadata injection."""
for name, body in episodes:
# Inject metadata block
timestamp = datetime.now(timezone.utc).isoformat()
body_with_metadata = {
**body,
"_metadata": {
"source": "guardkit_seeding",
"version": SEEDING_VERSION,
"created_at": timestamp,
"updated_at": timestamp,
"source_hash": None,
"entity_id": name,
}
}
await client.add_episode(
name=name,
episode_body=json.dumps(body_with_metadata),
group_id=group_id
)
Key points:
- Metadata is added automatically - episode definitions don't include it
- created_at and updated_at are set to current UTC time
- entity_id defaults to episode name (can be overridden)
- source_hash is None for generated content
Deduplication Strategy¶
The entity_id field enables deduplication:
Simple Deduplication¶
Check if episode with same entity_id already exists:
If exists, decide: - Replace: Delete old, insert new - Update: Modify existing episode - Skip: Keep existing, don't add new
Version-Aware Deduplication¶
Only replace if version has changed:
Hash-Based Deduplication¶
For file-based episodes, skip if hash matches:
MATCH (e:Episode {entity_id: $entity_id})
WHERE e.metadata.source_hash = $source_hash
RETURN e // Skip seeding
Version Management¶
Checking Seeding Version¶
The seeding marker file stores the version:
Selective Re-Seeding¶
Re-seed only if version has changed:
def needs_reseeding(client_version: str) -> bool:
"""Check if re-seeding is needed."""
marker = load_marker_file()
if not marker:
return True
# Compare versions
return marker["version"] != client_version
Version Queries¶
Find episodes by version:
async def find_episodes_by_version(client, version: str):
"""Find all episodes created by specific version."""
results = await client.search(
query=f"metadata.version:{version}",
num_results=1000
)
return results
Update Detection¶
Timestamp-Based¶
Find episodes updated after a certain time:
from datetime import datetime, timedelta
cutoff = datetime.now() - timedelta(days=30)
cutoff_iso = cutoff.isoformat()
# Find episodes updated in last 30 days
results = await client.search(
query=f"metadata.updated_at > {cutoff_iso}",
num_results=100
)
Hash-Based¶
Detect when source file has changed:
import hashlib
def compute_file_hash(file_path: Path) -> str:
"""Compute SHA256 hash of file."""
return hashlib.sha256(file_path.read_bytes()).hexdigest()
def has_file_changed(file_path: Path, stored_hash: str) -> bool:
"""Check if file hash has changed."""
current_hash = compute_file_hash(file_path)
return current_hash != stored_hash
Provenance Tracking¶
The source field enables provenance queries:
Find System Context¶
Find User-Generated Knowledge¶
Find Feature-Build Knowledge¶
Schema Evolution¶
Adding New Fields¶
To add new metadata fields:
- Update
_add_episodes()to include new field - Bump
SEEDING_VERSION(minor version) - Document new field in this guide
- Re-seed with
--force
Example:
body_with_metadata = {
**body,
"_metadata": {
"source": "guardkit_seeding",
"version": SEEDING_VERSION,
"created_at": timestamp,
"updated_at": timestamp,
"source_hash": None,
"entity_id": name,
"new_field": "new_value", # New field
}
}
Deprecating Fields¶
To deprecate a field:
- Mark as deprecated in this guide
- Continue populating (backwards compatibility)
- After 2 major versions, remove field
- Bump
SEEDING_VERSION(major version)
Best Practices¶
1. Use Consistent Entity IDs¶
# Good: Consistent format
"command_task_work"
"template_fastapi_python"
"pattern_dependency_injection"
# Bad: Inconsistent format
"taskWork"
"template-fastapi"
"pattern_dep_inj"
2. Always Include All Required Fields¶
# Good: All required fields present
"_metadata": {
"source": "guardkit_seeding",
"version": "1.0.0",
"created_at": "2024-01-30T12:00:00+00:00",
"updated_at": "2024-01-30T12:00:00+00:00",
"source_hash": None,
"entity_id": "command_task_work"
}
# Bad: Missing fields
"_metadata": {
"source": "guardkit_seeding",
"version": "1.0.0"
}
3. Use UTC Timestamps¶
# Good: UTC timezone
from datetime import datetime, timezone
timestamp = datetime.now(timezone.utc).isoformat()
# Bad: Local time without timezone
timestamp = datetime.now().isoformat()
4. Compute Source Hash for Files¶
# Good: Hash actual file content
source_hash = hashlib.sha256(file.read_bytes()).hexdigest()
# Bad: Use file path as hash
source_hash = str(hash(file_path))
5. Bump Version Appropriately¶
- Major (1.0.0 → 2.0.0): Breaking schema changes
- Minor (1.0.0 → 1.1.0): New fields, new categories
- Patch (1.0.0 → 1.0.1): Bug fixes, typos
Validation¶
Required Field Validation¶
def validate_metadata(metadata: dict) -> bool:
"""Validate metadata has all required fields."""
required = ["source", "version", "created_at", "updated_at", "source_hash", "entity_id"]
return all(field in metadata for field in required)
Timestamp Validation¶
from datetime import datetime
def validate_timestamp(timestamp: str) -> bool:
"""Validate ISO8601 timestamp."""
try:
dt = datetime.fromisoformat(timestamp)
return dt.tzinfo is not None # Must have timezone
except ValueError:
return False
Version Validation¶
import re
def validate_version(version: str) -> bool:
"""Validate semantic version format."""
pattern = r'^\d+\.\d+\.\d+$'
return bool(re.match(pattern, version))
See Also¶
- Graphiti Commands Guide - CLI command reference
- Graphiti Integration Guide - Overall architecture
- Seeding Module - Implementation details
- TASK-GR-PRE-000-A - Metadata implementation task