Guide
Schema Validation for Baseline Metadata
Within the broader Automated EXPLAIN Capture & Storage Workflows pipeline, Schema Validation for Baseline Metadata operates as a strictly isolated gatekeeping stage. Its sole responsibility is to verify that incoming query plan artifacts align with the current database schema state before they are committed to the baseline registry. This stage does not execute queries, capture execution traces, or transform plan trees. Instead, it performs deterministic structural validation against a versioned schema snapshot, ensuring that performance regression analysis downstream operates on structurally sound data.
Pipeline Position & Strict Stage Isolation
The validation stage ingests normalized plan payloads and schema manifests from upstream processors. It emits either a validated metadata record or a structured rejection event. Strict isolation is enforced by architectural boundary: the validator never reaches out to live database instances, never modifies baseline tables, and never performs statistical sampling. All inputs are treated as immutable JSON or Parquet artifacts. This boundary guarantees that schema drift, DDL migrations, or index alterations are caught deterministically before they corrupt historical performance tracking.
Data flow through this stage follows a linear, stateless progression:
- Ingress: Receives a normalized plan envelope containing
query_hash,schema_version, and structural references. - Manifest Resolution: Fetches the corresponding schema manifest from a version-controlled registry (e.g., Git-backed object storage or a schema registry API).
- Deterministic Validation: Applies declarative structural rules against the payload.
- Routing Decision: Emits an accepted record, a soft-drift warning, or a hard rejection based on explicit thresholds.
- Egress: Publishes the envelope to the appropriate downstream queue for Normalizing Query Plans for Cross-Engine Comparison or baseline ingestion.
Deterministic Validation Logic & Routing Thresholds
The validation engine operates on a set of compiled, in-memory rules mapped to baseline metadata fields. Each incoming record must pass three deterministic checks before advancing. Routing decisions are governed by strict numerical thresholds to prevent subjective drift classification.
| Check Category | Validation Rule | Threshold / Routing Logic |
|---|---|---|
| Structural Alignment | Table/column references in the plan must exist in the manifest. | missing_refs == 0 → ACCEPTmissing_refs > 0 → REJECT |
| Type & Constraint Consistency | Data types, nullability, and PK/FK constraints must match the snapshot. | type_promotions <= 2 AND constraint_drops == 0 → WARNconstraint_drops > 0 → REJECT |
| Index & Partition Mapping | Referenced indexes and partition keys must be present and active. | stale_refs == 0 → ACCEPT0 < stale_refs <= 3 → WARNstale_refs > 3 → REJECT |
Routing outputs are deterministic and idempotent. Accepted records proceed to baseline storage. Warning-tagged records are routed to a drift-monitoring queue for automated reconciliation. Rejected records trigger immediate quarantine and alerting. This routing matrix ensures that Routing EXPLAIN ANALYZE Output to Centralized Logs receives only structurally verified payloads, preventing log pollution and false-positive regression alerts.
Production Implementation Patterns
Platform teams should implement the validator as a stateless, async microservice or sidecar process. The following Python implementation demonstrates production-ready patterns using pydantic for payload parsing, jsonschema for structural validation, and explicit routing logic.
import asyncio
import json
import logging
from typing import Literal
from pydantic import BaseModel, ValidationError
from jsonschema import validate, Draft202012Validator
from opentelemetry import trace, metrics
logger = logging.getLogger("schema_validator")
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)
validation_results_counter = meter.create_counter(
"validation_results_total", description="Count of validation outcomes"
)
class PlanEnvelope(BaseModel):
query_hash: str
schema_version: str
plan_tree: dict
referenced_tables: list[str]
referenced_columns: list[dict[str, str]]
referenced_indexes: list[str]
class ValidationOutcome(BaseModel):
status: Literal["ACCEPT", "WARN", "REJECT"]
query_hash: str
drift_score: float
details: list[str]
async def validate_plan_envelope(envelope: dict, manifest: dict) -> ValidationOutcome:
with tracer.start_as_current_span("validate_baseline_metadata"):
try:
parsed = PlanEnvelope(**envelope)
except ValidationError as e:
logger.error("Malformed envelope: %s", e)
return ValidationOutcome(status="REJECT", query_hash="unknown", drift_score=1.0, details=["INVALID_PAYLOAD"])
missing_tables = [t for t in parsed.referenced_tables if t not in manifest.get("tables", [])]
missing_refs = len(missing_tables)
constraint_drops = sum(1 for idx in parsed.referenced_indexes if idx not in manifest.get("active_indexes", []))
stale_refs = constraint_drops
# Routing Threshold Evaluation
if missing_refs > 0:
validation_results_counter.add(1, {"status": "REJECT"})
return ValidationOutcome(status="REJECT", query_hash=parsed.query_hash, drift_score=1.0, details=[f"Missing tables: {missing_tables}"])
if stale_refs > 3:
validation_results_counter.add(1, {"status": "REJECT"})
return ValidationOutcome(status="REJECT", query_hash=parsed.query_hash, drift_score=0.9, details=[f"Stale index references: {stale_refs}"])
if 0 < stale_refs <= 3:
validation_results_counter.add(1, {"status": "WARN"})
return ValidationOutcome(status="WARN", query_hash=parsed.query_hash, drift_score=0.4, details=["Minor index drift detected"])
validation_results_counter.add(1, {"status": "ACCEPT"})
return ValidationOutcome(status="ACCEPT", query_hash=parsed.query_hash, drift_score=0.0, details=[])Configuration for the validation service should be externalized to allow dynamic threshold tuning without redeployments:
validator:
manifest_cache_ttl_seconds: 300
routing_thresholds:
max_missing_refs: 0
max_stale_indexes_warn: 3
max_stale_indexes_reject: 3
fallback:
enabled: true
max_manifest_age_seconds: 3600
circuit_breaker_failures: 5Observability Hooks & Telemetry
Schema validation must be fully observable to support SRE incident response and capacity planning. Implement the following telemetry hooks:
- Metrics: Export Prometheus-compatible counters for
validation_duration_seconds,validation_results_total{status="accept|warn|reject"}, andschema_manifest_age_seconds. Trackmanifest_fetch_latencyto detect registry degradation. - Structured Logging: Emit JSON logs with
correlation_id,schema_version,drift_score, androuting_decision. Include avalidation_trace_idto correlate with upstream EXPLAIN capture spans. - Distributed Tracing: Instrument manifest resolution and rule evaluation as distinct spans. Tag spans with
validation.statusandvalidation.drift_categoryfor downstream filtering in observability platforms. - Alerting Rules: Trigger P3 alerts when
validation_results_total{status="REJECT"}exceeds 5% of total throughput over a 15-minute window. Trigger P2 alerts ifschema_manifest_age_secondsexceeds the configured TTL, indicating registry sync failure.
Safe Fallback Protocols & Drift Quarantine
Network partitions, registry outages, or delayed DDL propagation can temporarily starve the validator of fresh manifests. Implement these fallback protocols to maintain pipeline continuity without compromising data integrity:
- Last-Known-Good Cache: If the primary registry is unreachable, fall back to the most recently validated manifest. Enforce a strict TTL (e.g., 3600s). Beyond TTL, route all payloads to quarantine rather than risking baseline corruption.
- Circuit Breaker: Track consecutive manifest fetch failures. After
circuit_breaker_failures(default: 5), open the circuit and bypass validation temporarily, routing all traffic to adrift_quarantineKafka topic withvalidation_mode="BYPASS". - Quarantine & Reconciliation: Rejected and bypassed payloads are written to a dead-letter queue with full context. A background reconciliation job periodically compares quarantined artifacts against the latest manifest. Validated records are promoted; invalid records are archived with a
schema_mismatchtag. - Automated Drift Resolution: When soft-drift warnings accumulate, trigger an automated diff against the DDL migration log. For detailed procedures on reconciling structural mismatches, refer to Validating Schema Changes Against Baseline Metadata.
By enforcing strict isolation, deterministic routing, and robust fallback mechanisms, the validation stage ensures that baseline metadata remains a reliable foundation for query optimization and performance regression automation.