Normalizing Query Plans for Cross-Engine Comparison

Normalizing query plans for cross-engine comparison is the deterministic transformation stage that converts raw, vendor-specific EXPLAIN output into a single canonical execution graph so PostgreSQL, MySQL, and distributed SQL plans can be fingerprinted, diffed, and baselined by identical rules. It is one sub-system of the Automated EXPLAIN Capture & Storage Workflows pipeline, sitting between ingestion and regression scoring. The normalization boundary is deliberately narrow: it does not capture raw SQL, measure runtime execution, or raise alerts. Its only job is structural and semantic translation from engine dialects into a stable intermediate representation (IR) that downstream stages can trust byte-for-byte.

Architectural Boundaries: What This Stage Consumes and Emits

The normalizer operates strictly downstream of ingestion and strictly upstream of regression analysis. Enforcing that boundary is what prevents parser drift from silently corrupting baseline registries and stops a single malformed plan from cascading into false-positive regression alerts.

Consumes (upstream): engine-tagged, pre-routed payloads produced by the capture layer. When Routing EXPLAIN ANALYZE Output to Centralized Logs is active, the normalization worker subscribes to partitioned topics keyed by engine_type. Under peak write load, Building Async Ingestion Pipelines for High-Throughput Queries guarantees backpressure management and at-least-once delivery before any worker consumes a payload, so the normalizer never has to implement its own transport reliability.

Emits (downstream): a versioned, schema-validated canonical IR plus a deterministic plan_fingerprint. That output is what the SHA-256 fingerprinting approach hashes, what Tracking Cost Deltas Across Baseline Versions diffs, and what Detecting Join-Type Shifts in Execution Plans inspects for access-method regressions. Runtime timing, I/O wait, and buffer-hit metrics are explicitly excluded here and deferred to downstream telemetry correlation.

The single hardest invariant is idempotency: identical raw inputs must yield byte-identical normalized outputs regardless of worker instance, processing order, or deployment timestamp. Every design choice below exists to protect that property.

Deterministic Routing and Schema Enforcement

Cross-engine comparison fails whenever operator nomenclature, cost units, or child-traversal order diverge between vendors. The canonical IR resolves this by mapping every vendor construct to a controlled vocabulary and enforcing a single deterministic serialization.

Input contract. Payloads are rejected at the door unless they carry every required field. This is a hard schema, not a suggestion:

JSON

{
  "engine_type": "ENUM(postgresql, mysql, distributed_sql)",
  "engine_version": "string (semver, e.g. 16.2)",
  "schema_context": "string (search_path or database.schema)",
  "query_hash": "string (64-char hex, upstream fingerprint)",
  "raw_plan": "object (engine-native EXPLAIN JSON)"
}

Output contract — core node schema. Each node in the emitted IR conforms to a published JSON Schema so downstream consumers can validate before they diff:

JSON

{
  "node_id": "uuid-v4",
  "op_type": "ENUM(FULL_TABLE_SCAN, INDEX_SCAN, HASH_JOIN, MERGE_JOIN, NESTED_LOOP, SORT, AGGREGATE, FILTER, LIMIT, SUBQUERY)",
  "relation": "string (schema.table or alias)",
  "predicate": "string (normalized to sorted conjunctive normal form)",
  "estimated_cost": "float (normalized to 0.0-1.0 relative to plan total)",
  "estimated_rows": "int",
  "parallel_workers": "int",
  "children": ["array of child node objects, sorted deterministically"]
}

Normalization rules.

Operator mapping. Vendor terms resolve to the controlled vocabulary. PostgreSQL Seq Scan and MySQL ALL both map to FULL_TABLE_SCAN; PostgreSQL Index Scan and MySQL ref/eq_ref map to INDEX_SCAN. This dialect-agnostic mapping mirrors the operator equivalence table used by Cost Estimation Mapping Across PostgreSQL and MySQL.
Cost normalization. Absolute cost units are meaningless across engines, so each node cost is divided by the plan total: normalized_cost = node_cost / plan_total_cost. Comparisons therefore reflect a node’s share of plan work, not vendor-specific cost arithmetic.
Tree determinism. Child nodes are sorted lexicographically by relation then op_type before serialization, guaranteeing stable hashing even when the optimizer returns non-deterministic child ordering.
Predicate canonicalization. Filter conditions are parsed to an AST, sorted by column name, and re-serialized, so WHERE a=1 AND b=2 and WHERE b=2 AND a=1 collapse to one fingerprint. Bind variables and literals are standardized upstream by Normalizing Parameterized Queries for Consistent Plan Tracking before they reach this stage.

Partition key. Normalized output is republished with a deterministic partition key so downstream consumers preserve per-query ordering: partition_key = query_hash (topic partition = crc32(query_hash) % partition_count). Keying on query_hash rather than plan_fingerprint guarantees that every plan variant for one logical query lands on the same partition, which is what lets the diff engine compare consecutive versions in order.

Production-Ready Implementation

The normalizer runs as a stateless async worker: it consumes engine-tagged payloads from a bounded queue, dispatches to a per-engine adapter, builds the canonical tree, computes the fingerprint, and validates against the IR schema before emitting. Instrumentation uses structlog for structured logs and OpenTelemetry for spans and metrics. The happy path is fully runnable; schema validation and partial-normalization fallbacks are wired in rather than sketched.

PYTHON

import asyncio
import hashlib
import json
from enum import Enum
from typing import Any

import asyncpg  # used by the sink to upsert canonical IR
import structlog
from opentelemetry import metrics, trace
from pydantic import BaseModel, Field, ValidationError

log = structlog.get_logger("plan_normalizer")
tracer = trace.get_tracer("plan_normalizer")
meter = metrics.get_meter("plan_normalizer")

NORMALIZE_OK = meter.create_counter("plan_normalization_success_total")
NORMALIZE_FAIL = meter.create_counter("plan_normalization_failure_total")
NORMALIZE_LATENCY = meter.create_histogram(
    "normalization_latency_ms", unit="ms"
)


class OpType(str, Enum):
    FULL_TABLE_SCAN = "FULL_TABLE_SCAN"
    INDEX_SCAN = "INDEX_SCAN"
    HASH_JOIN = "HASH_JOIN"
    MERGE_JOIN = "MERGE_JOIN"
    NESTED_LOOP = "NESTED_LOOP"
    SORT = "SORT"
    AGGREGATE = "AGGREGATE"
    FILTER = "FILTER"
    LIMIT = "LIMIT"
    SUBQUERY = "SUBQUERY"


class CanonicalNode(BaseModel):
    node_id: str
    op_type: OpType
    relation: str
    predicate: str | None = None
    estimated_cost: float = Field(ge=0.0, le=1.0)
    estimated_rows: int = Field(ge=0)
    parallel_workers: int = Field(ge=0)
    children: list["CanonicalNode"] = []


class NormalizedPlan(BaseModel):
    schema_version: str = "v2"
    engine_type: str
    engine_version: str
    schema_context: str
    query_hash: str
    plan_fingerprint: str
    root: CanonicalNode


# Vendor operator -> controlled vocabulary. Unknown operators are logged
# and mapped to FULL_TABLE_SCAN as the pessimistic default (never dropped).
OPERATOR_MAP: dict[str, dict[str, OpType]] = {
    "postgresql": {
        "Seq Scan": OpType.FULL_TABLE_SCAN,
        "Index Scan": OpType.INDEX_SCAN,
        "Index Only Scan": OpType.INDEX_SCAN,
        "Bitmap Heap Scan": OpType.INDEX_SCAN,
        "Hash Join": OpType.HASH_JOIN,
        "Merge Join": OpType.MERGE_JOIN,
        "Nested Loop": OpType.NESTED_LOOP,
        "Sort": OpType.SORT,
        "Aggregate": OpType.AGGREGATE,
        "HashAggregate": OpType.AGGREGATE,
        "Limit": OpType.LIMIT,
    },
    "mysql": {
        "ALL": OpType.FULL_TABLE_SCAN,
        "ref": OpType.INDEX_SCAN,
        "eq_ref": OpType.INDEX_SCAN,
        "range": OpType.INDEX_SCAN,
        "index": OpType.INDEX_SCAN,
    },
}


def _normalize_cost(node_cost: float, total_cost: float) -> float:
    if total_cost <= 0:
        return 0.0
    return round(node_cost / total_cost, 6)


def _canonical_predicate(raw: str | None) -> str | None:
    """Sort conjuncts so logically identical filters hash identically."""
    if not raw:
        return None
    conjuncts = sorted(part.strip() for part in raw.split(" AND "))
    return " AND ".join(conjuncts)


def compute_fingerprint(plan_dict: dict[str, Any]) -> str:
    serialized = json.dumps(plan_dict, sort_keys=True, separators=(",", ":"))
    return hashlib.sha256(serialized.encode("utf-8")).hexdigest()


def _build_tree(raw_node: dict[str, Any], engine: str, total_cost: float) -> dict[str, Any]:
    """Recursively map an engine-native node to canonical form."""
    raw_op = raw_node.get("Node Type", raw_node.get("access_type", "unknown"))
    mapped = OPERATOR_MAP.get(engine, {}).get(raw_op)
    if mapped is None:
        log.warning("unmapped_operator", engine=engine, raw_op=raw_op)
        mapped = OpType.FULL_TABLE_SCAN
    node_cost = float(
        raw_node.get("Total Cost", raw_node.get("cost_info", {}).get("read_cost", 0.0))
    )
    raw_children = raw_node.get("Plans", raw_node.get("nested_loop", [])) or []

    children = sorted(
        (_build_tree(c, engine, total_cost) for c in raw_children),
        key=lambda x: (x["relation"], x["op_type"]),
    )
    return {
        "node_id": raw_node.get("node_id", ""),
        "op_type": mapped.value,
        "relation": raw_node.get("Relation Name", raw_node.get("table_name", "")),
        "predicate": _canonical_predicate(
            raw_node.get("Filter", raw_node.get("attached_condition"))
        ),
        "estimated_cost": _normalize_cost(node_cost, total_cost),
        "estimated_rows": int(
            raw_node.get("Plan Rows", raw_node.get("rows_examined_per_scan", 0))
        ),
        "parallel_workers": int(raw_node.get("Workers Planned", 0)),
        "children": children,
    }


def normalize_payload(payload: dict[str, Any]) -> NormalizedPlan:
    """Pure, deterministic transform: raw envelope -> validated canonical IR."""
    engine = payload["engine_type"]
    with tracer.start_as_current_span("normalize") as span:
        span.set_attribute("engine_type", engine)
        raw_plan = payload["raw_plan"]
        total_cost = float(raw_plan.get("Total Cost", raw_plan.get("query_cost", 1.0)))
        root_dict = _build_tree(raw_plan["plan"], engine, total_cost)

        # Fingerprint over structure only — never over volatile metadata.
        fingerprint = compute_fingerprint({"engine": engine, "root": root_dict})
        plan = NormalizedPlan(
            engine_type=engine,
            engine_version=payload["engine_version"],
            schema_context=payload["schema_context"],
            query_hash=payload["query_hash"],
            plan_fingerprint=fingerprint,
            root=CanonicalNode(**root_dict),
        )
        span.set_attribute("plan_fingerprint", fingerprint)
        return plan


async def worker(queue: asyncio.Queue, pool: asyncpg.Pool, dlq: asyncio.Queue) -> None:
    """Consume payloads, normalize, upsert IR; route failures to the DLQ."""
    while True:
        payload = await queue.get()
        started = asyncio.get_running_loop().time()
        try:
            plan = normalize_payload(payload)
            async with pool.acquire() as conn:
                await conn.execute(
                    """
                    INSERT INTO canonical_ir (query_hash, plan_fingerprint, ir)
                    VALUES ($1, $2, $3)
                    ON CONFLICT (query_hash, plan_fingerprint) DO NOTHING
                    """,
                    plan.query_hash,
                    plan.plan_fingerprint,
                    plan.model_dump_json(),
                )
            NORMALIZE_OK.add(1, {"engine_type": plan.engine_type})
            log.info("normalized", fingerprint=plan.plan_fingerprint)
        except (KeyError, ValidationError) as exc:
            NORMALIZE_FAIL.add(1, {"engine_type": payload.get("engine_type", "unknown")})
            log.error("normalization_failed", error=str(exc))
            await dlq.put({"raw": payload, "error": str(exc)})
        finally:
            elapsed_ms = (asyncio.get_running_loop().time() - started) * 1000
            NORMALIZE_LATENCY.record(elapsed_ms)
            queue.task_done()

The upsert is keyed on (query_hash, plan_fingerprint) with ON CONFLICT DO NOTHING, which makes re-delivery from an at-least-once bus a no-op rather than a duplicate-baseline hazard. That idempotent write is what lets the worker pool scale horizontally without coordination.

Threshold Table and Alerting

These are the SLOs the normalization stage is held to. Values are exact and enforced by the alert rules that follow — they are not tuning suggestions.

Metric	Pass	Warn	Block / Page	Automation trigger
`normalization_latency_ms` (p95)	< 40 ms	40–120 ms	> 120 ms	Scale worker pool by +2 replicas at Warn
`plan_normalization_failure_total` rate (60 s window)	< 1%	1–5%	> 5%	Open circuit breaker at Block
`dlq_depth`	0–50	51–500	> 500	Page SRE on-call at Block
`unmapped_operator` rate	< 0.1%	0.1–1%	> 1%	Freeze operator map; require mapping PR
`circuit_breaker_state`	closed	half-open	open > 300 s	Escalate to P1 if open beyond 300 s

The failure-rate and DLQ thresholds are expressed as Prometheus alert rules:

YAML

groups:
  - name: plan-normalization
    rules:
      - alert: NormalizationFailureRateHigh
        expr: |
          sum(rate(plan_normalization_failure_total[1m]))
            / sum(rate(plan_normalization_success_total[1m])) > 0.05
        for: 60s
        labels: { severity: page }
        annotations:
          summary: "Normalization failure rate > 5% — circuit breaker opening"
      - alert: NormalizationDLQBacklog
        expr: dlq_depth > 500
        for: 2m
        labels: { severity: page }
        annotations:
          summary: "Normalizer DLQ depth > 500 — manual triage required"
      - alert: NormalizationLatencyP95High
        expr: histogram_quantile(0.95, rate(normalization_latency_ms_bucket[5m])) > 120
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "Normalization p95 latency > 120ms — scale worker pool"

Failure Scenarios and Root Cause Analysis

Normalization failures must be contained, observable, and recoverable without blocking ingestion. These are the recurring modes and how to work them.

1. Unmapped operator after an engine upgrade. Symptom: unmapped_operator rate climbs immediately after a minor engine bump; new plans map to FULL_TABLE_SCAN and trip spurious regressions downstream. Diagnostics: structlog emits unmapped_operator events with the offending raw_op; aggregate them: journalctl -u plan-normalizer | grep unmapped_operator | jq -r .raw_op | sort | uniq -c. Mitigation: add the operator to OPERATOR_MAP behind a versioned mapping table, ship the PR, and reprocess the affected query_hash range from the DLQ. Never silently accept the pessimistic default in production.

2. Non-deterministic fingerprints for identical plans. Symptom: the same logical plan produces two plan_fingerprint values across workers, inflating baseline churn. Diagnostics: diff two IR envelopes with jq -S and compare child ordering and predicate strings; a mismatch in child order points at the sort key, a mismatch in predicate points at canonicalization. Mitigation: confirm the child sort key is (relation, op_type) and that _canonical_predicate sorts conjuncts. Any field with wall-clock or address values must be stripped before hashing, exactly as required by the SHA-256 fingerprinting approach.

3. Schema-validation storm on malformed payloads. Symptom: plan_normalization_failure_total spikes; the DLQ fills with ValidationError entries. Diagnostics: inspect DLQ records for the failing field path; a single upstream capture bug usually shows one repeated field. Validate the upstream envelope against the contract enforced by Schema Validation for Baseline Metadata. Mitigation: fix the capture-side envelope, then replay the DLQ. The circuit breaker (open at > 5% failure rate over 60 s) prevents the storm from starving healthy engines.

4. Cost normalization divide-by-zero on trivial plans. Symptom: single-row lookups report plan_total_cost = 0, producing NaN costs that fail the 0.0–1.0 field constraint. Diagnostics: estimated_cost fields appear as null or the node fails Pydantic bounds validation. Mitigation: _normalize_cost guards total_cost <= 0 and returns 0.0; keep that guard and treat zero-cost plans as structurally comparable by topology alone.

5. Partition hot-spotting on a dominant query. Symptom: one partition’s consumer lag grows while others idle because a single query_hash dominates traffic. Diagnostics: per-partition lag skew in the consumer group metrics. Mitigation: the partition key is intentionally query_hash to preserve per-query ordering; absorb hot keys with additional worker concurrency on that partition rather than re-keying, which would break ordered diffing in Tracking Cost Deltas Across Baseline Versions.

Configuration Reference

Key tuning knobs, all environment-overridable so the same image runs across fleets:

Env var	Default	Purpose
`NORM_SCHEMA_VERSION`	`v2`	Target IR schema; payloads for deprecated versions route to `legacy_dlq`
`NORM_WORKER_CONCURRENCY`	`8`	Coroutines per replica draining the bounded queue
`NORM_QUEUE_MAXSIZE`	`1000`	Bounded `asyncio.Queue` size; backpressure signal to the consumer
`NORM_FAILURE_RATE_THRESHOLD`	`0.05`	Circuit-breaker trip point over the 60 s sliding window
`NORM_BREAKER_COOLDOWN_S`	`300`	Seconds the breaker stays open before half-open
`NORM_BREAKER_HALFOPEN_PROBE`	`50`	Consecutive clean payloads required to re-close the breaker
`NORM_DLQ_MAX_DEPTH`	`500`	DLQ depth that pages the on-call
`NORM_COST_PRECISION`	`6`	Decimal places retained in normalized cost, fixed for stable hashing

Safe fallback protocols. Partial normalization strips an unparseable predicate, logs a WARN, and still fingerprints the remaining valid structure. The IR schema is versioned so workers reject deprecated payloads into legacy_dlq for batch migration rather than mis-hashing them. DLQ consumers run a reconciliation job that retries with updated mapping tables and reinjects successes tagged reprocessed: true.

Downstream Routing and Diff Thresholds

Once normalized, the IR is republished partitioned by query_hash. Regression services consume the stream and diff structural fingerprints against the baseline registry. If plan_fingerprint matches a baseline recorded in the last 30 days, the payload is archived as a STABLE path. If it diverges, a structural diff engine computes an AST edit distance; a distance above 0.15 (15% structural change) routes the payload to the REGRESSION_ALERT queue, where the actual pass/warn/block decision is made against the bands defined in Defining Regression Thresholds for Query Plans. This clean separation — normalization owns structure, the rule engine owns judgement — is what keeps cross-engine regression detection free of vendor-formatting noise across a heterogeneous fleet.

Normalizing Parameterized Queries for Consistent Plan Tracking — literal stripping and bind-variable standardization that runs before this stage.
Building Async Ingestion Pipelines for High-Throughput Queries — the backpressure and delivery guarantees this worker depends on.
Plan Hashing Algorithms for SQL Engines — the SHA-256 fingerprinting the canonical IR feeds.
Cost Estimation Mapping Across PostgreSQL and MySQL — the operator and cost equivalence tables behind the mapping rules.
Tracking Cost Deltas Across Baseline Versions — the downstream consumer that diffs normalized fingerprints.

← Back to Automated EXPLAIN Capture & Storage Workflows

Architectural Boundaries: What This Stage Consumes and Emits #

Deterministic Routing and Schema Enforcement #

Production-Ready Implementation #

Threshold Table and Alerting #

Failure Scenarios and Root Cause Analysis #

Configuration Reference #

Downstream Routing and Diff Thresholds #

Related #