Detecting Join Type Shifts in Execution Plans

This stage owns exactly one job in the regression pipeline: comparing the join strategy of a candidate execution plan against its anchored baseline and emitting a deterministic join-shift event — a hash-to-nested-loop, merge-to-hash, or any other transition between join operators on a structurally aligned node — without ever touching runtime latency, cost estimates, or index telemetry. A shift from a hash join to a nested loop join, or from a merge join to a hash join, is frequently the earliest structural symptom of statistics drift, a dropped index, or parameter sensitivity, and catching it here — between plan capture and downstream policy routing — gives Database SREs a signal that fires before p95 latency ever moves.

Architectural Boundaries

The join-shift detector is a pure transformation node inside the broader Regression Detection & Rule Engines subsystem. It has one upstream contract and one downstream contract, and it deliberately refuses to do anything outside them.

Upstream (consumes): normalized, immutable plan artifacts produced by the Automated EXPLAIN Capture & Storage Workflows pipeline. Each artifact is the output of Normalizing Query Plans for Cross-Engine Comparison — a canonical operator tree keyed by the plan fingerprint from Plan Hashing Algorithms for SQL Engines. The detector requires a baseline plan reference, a candidate plan reference, a deterministic operator tree, and a plan_schema_version. It never re-parses raw EXPLAIN text; that responsibility belongs strictly upstream.

Downstream (emits): a single structured JoinShiftEvent per aligned node that changed join type. That event carries the node coordinates, the baseline and candidate join types, the predicate hash, a confidence score, and a routing severity. It is published to the rule engine, which correlates it with the cost signal from Tracking Cost Deltas Across Baseline Versions and the access-path signal from Monitoring Index Usage Changes for Regression Signals before any WARN/BLOCK verdict is reached. The detector itself never blocks a deploy, opens a ticket, or rewrites a query.

This isolation is load-bearing. Because the stage evaluates only plan topology, it is idempotent, safe to run in parallel across thousands of query fingerprints, and fully engine-agnostic. Structural detection and root-cause attribution stay decoupled: this node answers “did the join strategy change?” and nothing else.

Deterministic Routing and Schema Enforcement

Detection is structural, not textual. Two plans that differ only in whitespace, alias naming, or constant folding must compare as identical; two plans that keep every predicate but swap Hash Join for Nested Loop must produce an event. The stage therefore aligns nodes by query-block ID and a deterministic predicate signature, then compares the normalized join operator at each aligned node.

Field contract

Every payload entering the stage is validated against a pinned JSON Schema before any traversal runs. A schema mismatch short-circuits to comparison-free routing rather than risking a topologically meaningless diff.

JSON

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "JoinShiftDetectionRequest",
  "type": "object",
  "required": ["plan_schema_version", "query_fingerprint", "baseline", "candidate"],
  "properties": {
    "plan_schema_version": { "type": "string", "pattern": "^\\d+\\.\\d+\\.\\d+$" },
    "query_fingerprint": { "type": "string", "pattern": "^[0-9a-f]{64}$" },
    "baseline": { "$ref": "#/$defs/plan" },
    "candidate": { "$ref": "#/$defs/plan" }
  },
  "$defs": {
    "plan": {
      "type": "object",
      "required": ["plan_id", "nodes"],
      "properties": {
        "plan_id": { "type": "string" },
        "nodes": {
          "type": "array",
          "items": {
            "type": "object",
            "required": ["node_id", "query_block_id", "operator_type", "predicate", "est_rows"],
            "properties": {
              "node_id": { "type": "string" },
              "query_block_id": { "type": "string" },
              "operator_type": { "type": "string" },
              "predicate": {
                "type": "object",
                "required": ["left_col", "right_col", "operator"],
                "properties": {
                  "left_col": { "type": "string" },
                  "right_col": { "type": "string" },
                  "operator": { "type": "string" }
                }
              },
              "est_rows": { "type": "number", "minimum": 0 }
            }
          }
        }
      }
    }
  }
}

Operator normalization

Each engine spells its join operators differently. The normalizer collapses them into a closed enumeration — HASH_JOIN, NESTED_LOOP_JOIN, MERGE_JOIN, UNKNOWN — so a PostgreSQL Hash Join, a MySQL hash join, and a SQL Server Hash Match (Inner Join) all resolve to the same canonical type. Anything outside the known map resolves to UNKNOWN and caps downstream confidence rather than being silently dropped.

Predicate alignment

Node alignment uses a SHA-256 hash of the normalized column references, comparison operator, and any bound literals. Only when the baseline and candidate predicate hashes match exactly does the stage compare join types on that node. This eliminates false positives from aliasing and constant folding, and it distinguishes a genuine join-strategy shift from a predicate rewrite (which is a different regression class entirely).

Partition key formula

For high-throughput deployments, emitted events are published to Kafka with the partition key computed as:

partition_key = sha256(query_fingerprint + "|" + query_block_id)[:16]

Keying on query_fingerprint|query_block_id guarantees strict per-topology ordering — every event for a given join node lands on the same partition and is consumed in emission order — while still spreading load across the consumer group for parallel scaling.

Production-Ready Implementation

The detector runs as an asyncio worker: it pulls a detection request, loads any missing plan nodes from the baseline store over asyncpg, traverses both trees, and emits events. It uses structlog for structured JSON logging and OpenTelemetry for spans and metrics. All plan value objects are frozen dataclasses so concurrent evaluations cannot mutate shared state.

PYTHON

from __future__ import annotations

import asyncio
import hashlib
from dataclasses import dataclass
from enum import Enum
from typing import Any, Optional

import asyncpg
import structlog
from opentelemetry import metrics, trace

log = structlog.get_logger("join_shift_detector")
tracer = trace.get_tracer("regression.join_shift")
meter = metrics.get_meter("regression.join_shift")

detections_total = meter.create_counter(
    "join_shift_detection_total", unit="1",
    description="Aligned nodes evaluated for a join-type shift",
)
shifts_total = meter.create_counter(
    "join_shift_emitted_total", unit="1",
    description="Join-shift events emitted, labelled by severity",
)
parse_failures = meter.create_counter(
    "join_shift_parse_failure_total", unit="1",
    description="Requests rejected before traversal (schema/parse failure)",
)
detect_latency = meter.create_histogram(
    "join_shift_detect_latency_ms", unit="ms",
    description="Wall-clock latency of a full request traversal",
)

# Confidence and severity are exact, not heuristic.
_UNKNOWN_CONFIDENCE = 0.65
_FULL_CONFIDENCE = 1.0
_CRITICAL_MIN = 0.95
_WARNING_MIN = 0.80


class JoinType(Enum):
    HASH_JOIN = "hash"
    NESTED_LOOP_JOIN = "nested_loop"
    MERGE_JOIN = "merge"
    UNKNOWN = "unknown"


_OPERATOR_MAP = {
    "hash join": JoinType.HASH_JOIN,
    "hashjoin": JoinType.HASH_JOIN,
    "hash match": JoinType.HASH_JOIN,
    "hash match (inner join)": JoinType.HASH_JOIN,
    "nested loop join": JoinType.NESTED_LOOP_JOIN,
    "nested loop": JoinType.NESTED_LOOP_JOIN,
    "nested loops": JoinType.NESTED_LOOP_JOIN,
    "merge join": JoinType.MERGE_JOIN,
    "sort merge join": JoinType.MERGE_JOIN,
    "merge": JoinType.MERGE_JOIN,
}


def normalize_operator(raw_op: str) -> JoinType:
    return _OPERATOR_MAP.get(raw_op.lower().strip(), JoinType.UNKNOWN)


@dataclass(frozen=True)
class PredicateSignature:
    left_col: str
    right_col: str
    operator: str

    def hash(self) -> str:
        raw = f"{self.left_col}|{self.right_col}|{self.operator}"
        return hashlib.sha256(raw.encode("utf-8")).hexdigest()


@dataclass(frozen=True)
class JoinNode:
    node_id: str
    query_block_id: str
    join_type: JoinType
    predicate: PredicateSignature
    estimated_rows: float


@dataclass(frozen=True)
class JoinShiftEvent:
    query_fingerprint: str
    baseline_id: str
    candidate_id: str
    node_id: str
    baseline_type: JoinType
    candidate_type: JoinType
    predicate_hash: str
    confidence: float
    routing_severity: str


class ParseError(ValueError):
    """Raised when a plan node is missing required fields."""


def _to_node(raw: dict[str, Any]) -> JoinNode:
    try:
        return JoinNode(
            node_id=raw["node_id"],
            query_block_id=raw["query_block_id"],
            join_type=normalize_operator(raw["operator_type"]),
            predicate=PredicateSignature(**raw["predicate"]),
            estimated_rows=float(raw["est_rows"]),
        )
    except (KeyError, TypeError) as exc:
        raise ParseError(f"malformed plan node: {exc}") from exc


def _severity(confidence: float) -> str:
    if confidence >= _CRITICAL_MIN:
        return "CRITICAL"
    if confidence >= _WARNING_MIN:
        return "WARNING"
    return "INFO"


def _diff_aligned(
    fingerprint: str, baseline_id: str, candidate_id: str,
    b_node: JoinNode, c_node: JoinNode,
) -> Optional[JoinShiftEvent]:
    # Nodes must be the same query block and the same predicate to be comparable.
    if b_node.query_block_id != c_node.query_block_id:
        return None
    if b_node.predicate.hash() != c_node.predicate.hash():
        return None
    if b_node.join_type == c_node.join_type:
        return None

    ambiguous = JoinType.UNKNOWN in (b_node.join_type, c_node.join_type)
    confidence = _UNKNOWN_CONFIDENCE if ambiguous else _FULL_CONFIDENCE
    return JoinShiftEvent(
        query_fingerprint=fingerprint,
        baseline_id=baseline_id,
        candidate_id=candidate_id,
        node_id=b_node.node_id,
        baseline_type=b_node.join_type,
        candidate_type=c_node.join_type,
        predicate_hash=b_node.predicate.hash(),
        confidence=confidence,
        routing_severity=_severity(confidence),
    )


async def _load_baseline_nodes(pool: asyncpg.Pool, plan_id: str) -> list[dict[str, Any]]:
    """Hydrate baseline nodes from the plan store when the request omits them."""
    rows = await pool.fetch(
        """
        SELECT node_id, query_block_id, operator_type, predicate, est_rows
        FROM plan_nodes
        WHERE plan_id = $1 AND operator_type ILIKE '%join%'
        ORDER BY query_block_id
        """,
        plan_id,
    )
    return [dict(r) for r in rows]


async def detect(pool: asyncpg.Pool, request: dict[str, Any]) -> list[JoinShiftEvent]:
    fingerprint = request["query_fingerprint"]
    baseline = request["baseline"]
    candidate = request["candidate"]

    with tracer.start_as_current_span("join_shift.detect") as span:
        span.set_attribute("query_fingerprint", fingerprint)
        span.set_attribute("plan_schema_version", request["plan_schema_version"])
        loop = asyncio.get_running_loop()
        started = loop.time()

        try:
            b_raw = baseline.get("nodes") or await _load_baseline_nodes(pool, baseline["plan_id"])
            baseline_nodes = {n["query_block_id"]: _to_node(n) for n in b_raw}
            candidate_nodes = {
                n["query_block_id"]: _to_node(n) for n in candidate["nodes"]
            }
        except ParseError as exc:
            parse_failures.add(1, {"fingerprint": fingerprint})
            span.record_exception(exc)
            await log.awarning("parse_failure", fingerprint=fingerprint, error=str(exc))
            raise

        events: list[JoinShiftEvent] = []
        for block_id, c_node in candidate_nodes.items():
            b_node = baseline_nodes.get(block_id)
            if b_node is None:
                continue  # New join block; handled by structural-drift, not this stage.
            detections_total.add(1, {"fingerprint": fingerprint})
            event = _diff_aligned(
                fingerprint, baseline["plan_id"], candidate["plan_id"], b_node, c_node
            )
            if event is not None:
                events.append(event)
                shifts_total.add(1, {"severity": event.routing_severity})
                await log.ainfo(
                    "join_shift_detected",
                    fingerprint=fingerprint,
                    node_id=event.node_id,
                    baseline_type=event.baseline_type.value,
                    candidate_type=event.candidate_type.value,
                    severity=event.routing_severity,
                    confidence=event.confidence,
                )

        detect_latency.record((loop.time() - started) * 1000.0)
        span.set_attribute("join_shift.event_count", len(events))
        return events

Every aligned node increments join_shift_detection_total; every emitted shift increments join_shift_emitted_total labelled by severity; and each JoinShiftEvent inherits the ambient span’s trace_id/span_id, letting the rule engine correlate a topology change back to the exact operator and the query execution trace that produced it.

Threshold Table and Alerting

Confidence is derived structurally: a shift between two known join types is 1.0; a shift involving UNKNOWN is capped at 0.65. Severity maps onto exact bands, and the worker’s own SLOs are enforced separately from detection confidence.

Signal	Band	Value	Automated action
Shift confidence	CRITICAL	`>= 0.95`	Page on-call SRE, freeze baseline, open ticket
Shift confidence	WARNING	`0.80 – 0.949`	Queue for regression testing, log to SIEM
Shift confidence	INFO	`< 0.80`	Archive for trend analysis, no routing
Detect latency (`p95`)	Warn	`> 40 ms`	Scale consumer pool +1 replica
Detect latency (`p95`)	Block	`> 120 ms`	Circuit-break, shed to async batch queue
Parse failure rate (5 min)	Warn	`> 0.5%`	Alert; inspect upstream normalizer
Parse failure rate (5 min)	Block	`> 2%`	Halt intake, route all payloads to DLQ
Consumer lag	Block	`> 10000 msgs`	Auto-scale, then page platform team

The routing directives themselves are evaluated against the confidence bands:

confidence >= 0.95 → CRITICAL: immediate on-call page, automatic baseline freeze, and ticket creation.
0.80 <= confidence < 0.95 → WARNING: logged to SIEM, queued for the automated regression testing pipeline.
confidence < 0.80 → INFO: archived for trend analysis; no automated routing.

A Prometheus alert on the worker’s own latency and failure SLOs is wired as:

YAML

groups:
  - name: join-shift-detector
    rules:
      - alert: JoinShiftDetectLatencyHigh
        expr: histogram_quantile(0.95, rate(join_shift_detect_latency_ms_bucket[5m])) > 120
        for: 3m
        labels: { severity: page }
        annotations:
          summary: "Join-shift detector p95 latency > 120ms"
          runbook: "Scale consumers or shed to async batch queue"
      - alert: JoinShiftParseFailureRateHigh
        expr: rate(join_shift_parse_failure_total[5m]) / rate(join_shift_detection_total[5m]) > 0.02
        for: 5m
        labels: { severity: page }
        annotations:
          summary: "Join-shift parse-failure rate > 2%"
          runbook: "Inspect upstream plan normalizer; payloads routed to DLQ"
      - alert: JoinShiftCriticalBurst
        expr: increase(join_shift_emitted_total{severity="CRITICAL"}[10m]) > 25
        for: 0m
        labels: { severity: page }
        annotations:
          summary: "Burst of CRITICAL join shifts — probable fleet-wide stats drift"

Tune the bands per workload class. OLTP fingerprints where a nested-loop regression is catastrophic should treat any WARNING as gating; analytical fingerprints that legitimately oscillate between hash and merge joins under partition pruning tolerate wider archival windows. The rule engine, not this stage, owns that policy — see Tuning Thresholds for False Positive Reduction for the suppression model.

Failure Scenarios and Root Cause Analysis

1. Schema version mismatch. Symptom: candidate.plan_schema_version differs from the baseline registry; traversal would compare incompatible operator vocabularies. Diagnostics: SELECT plan_id, plan_schema_version FROM plan_artifacts WHERE query_fingerprint = $1 ORDER BY captured_at DESC LIMIT 5;. Mitigation: the stage never diffs across versions — it routes the payload to schema_drift_dlq and fires a webhook that triggers a metadata sync to refresh the canonical operator dictionary. No event is emitted.

2. Ambiguous operator trees. Symptom: normalize_operator returns JoinType.UNKNOWN for a candidate node because the engine emitted an unmapped operator label. Diagnostics: grep the structured logs for join_shift_detected events where candidate_type="unknown", then inspect the raw operator_type in the source plan. Mitigation: confidence is capped at 0.65, the event is tagged AMBIGUOUS_SHIFT and routed to WARNING with a requires_manual_review flag; the unmapped label is queued for addition to _OPERATOR_MAP.

3. Predicate drift masquerading as a join shift. Symptom: baseline and candidate use different join operators, but on different predicate sets — a rewrite, not a strategy change. Diagnostics: compare the predicate hashes emitted in the span attributes; mismatched hashes mean the nodes were never comparable. Mitigation: _diff_aligned returns None on a predicate-hash mismatch, so no false event is produced. The divergence is instead surfaced by the structural-diff branch upstream.

4. Malformed or truncated plan payloads. Symptom: a node is missing predicate or est_rows; _to_node raises ParseError. Diagnostics: check join_shift_parse_failure_total and the correlated parse_failure log entries. Mitigation: the request is rejected before traversal, the raw payload is forwarded to a dead-letter queue with a retry_count header, and retries are capped at three attempts with exponential backoff. Original payload integrity is preserved for audit.

5. Baseline node cache miss under load. Symptom: the request omits baseline.nodes and _load_baseline_nodes adds a round-trip to asyncpg, inflating p95 latency past 40 ms. Diagnostics: trace spans showing time dominated by the plan_nodes query; rising join_shift_detect_latency_ms. Mitigation: have the capture stage inline baseline join nodes into the request payload, or front the plan store with a read-through cache keyed by plan_id.

Configuration Reference

Env var / flag	Default	Purpose
`JOINSHIFT_PLAN_DSN`	—	`asyncpg` DSN for the baseline plan store (read-replica)
`JOINSHIFT_POOL_MIN`	`4`	Minimum `asyncpg` pool connections per worker
`JOINSHIFT_POOL_MAX`	`16`	Maximum pool connections; cap below replica `max_connections`
`JOINSHIFT_KAFKA_TOPIC`	`join-shift-events`	Output topic for emitted `JoinShiftEvent`s
`JOINSHIFT_PARTITION_SALT`	`""`	Optional salt appended before hashing the partition key
`JOINSHIFT_UNKNOWN_CONFIDENCE`	`0.65`	Confidence cap when a node resolves to `UNKNOWN`
`JOINSHIFT_CRITICAL_MIN`	`0.95`	Lower bound of the `CRITICAL` severity band
`JOINSHIFT_WARNING_MIN`	`0.80`	Lower bound of the `WARNING` severity band
`JOINSHIFT_DLQ_MAX_RETRY`	`3`	Retry cap before a payload is parked in the DLQ
`JOINSHIFT_OTEL_ENDPOINT`	—	OTLP exporter endpoint for spans and metrics

All confidence and severity knobs are exposed as env vars so a workload class can tighten bands without a redeploy, but the defaults above are the values the reference implementation ships with — none are “configure as needed”.

← Back to Regression Detection & Rule Engines (parent topic)
Next in this area: Identifying Hash-to-Nested Loop Join Shifts Automatically
Correlate structural shifts with cost: Tracking Cost Deltas Across Baseline Versions
Correlate shifts with access paths: Monitoring Index Usage Changes for Regression Signals
Suppress noisy events: Tuning Thresholds for False Positive Reduction
Upstream input format: Normalizing Query Plans for Cross-Engine Comparison

Architectural Boundaries #

Deterministic Routing and Schema Enforcement #

Field contract #

Operator normalization #

Predicate alignment #

Partition key formula #

Production-Ready Implementation #

Threshold Table and Alerting #

Failure Scenarios and Root Cause Analysis #

Configuration Reference #

Related #