Regression Detection & Rule Engines

Regression Detection & Rule Engines are the enforcement core of automated query plan baseline tracking — the subsystem Database SREs, query optimization engineers, and Python platform teams rely on to catch optimizer drift, gate risky schema changes, and hold p95/p99 latency contracts before a bad plan ever reaches production traffic. This topic defines the deterministic pipeline, the declarative rule model, and the exact thresholds that turn noisy plan telemetry into safe, auditable deploy decisions.

Where the upstream Automated EXPLAIN Capture & Storage Workflows produce immutable plan artifacts and Core Architecture & Baselining Fundamentals defines how those artifacts are fingerprinted and anchored, this stage is where comparison, scoring, and policy enforcement happen. It consumes normalized baselines and emits a single, structured verdict — PASS, WARN, or BLOCK — that CI systems and rollout controllers act on without a human in the loop.

Pipeline Overview: Capture → Regression → CI Gate → Index Sync → Debugging

The automation is built as five strictly isolated stages with explicit input/output contracts. Isolation is the load-bearing property: telemetry ingestion must never block a deploy decision, rule evaluation must never introduce cascading latency into the request path, and a failure in one stage must fail closed without corrupting the others. Each boundary is a serialized, versioned artifact on a message broker, so any stage can scale, retry with exponential backoff, and be replayed independently.

Artifacts flow in one direction — from Capture, into Regression, through the CI Gate, out to Index Sync, and, on any violation, back into Debugging — and each hand-off is a typed contract:

Capture normalizes EXPLAIN (FORMAT JSON) output, runtime telemetry, and optimizer settings into an immutable, versioned artifact keyed by a canonical plan hash. Its contract: raw plan in, {plan_hash, canonical_dag, metadata_envelope} out.
Regression loads the active baseline for that query fingerprint, computes structural and statistical deltas, and produces a scored verdict payload. Its contract: {new_artifact, baseline_ref} in, {verdict, violated_rules[], deltas} out.
CI Gate is the synchronous enforcement boundary. It maps the verdict onto deployment policy and either admits or halts the change. Its contract: verdict payload in, exit code + annotated report out.
Index Sync propagates only baseline-approved structural changes (index DDL, access-path pins) to staging and production through declarative reconciliation. Its contract: approved change set in, applied-state receipt out.
Debugging receives every BLOCK and flagged WARN, provisions an isolated replay environment against a snapshot of production statistics, and links the violation back to the exact operator, statistics timestamp, and commit. Its contract: flagged artifact in, root-cause bundle out.

Five isolated stages joined by typed contracts: PASS/WARN verdicts flow to Index Sync, while BLOCK and flagged plans route to Debugging, which returns a root-cause bundle upstream.

This is the same five-stage topology used across Core Architecture & Baselining Fundamentals; this topic owns the middle three stages — Regression, CI Gate, and the debugging feedback loop — while Capture is detailed under the async ingestion pipeline design.

Stage-by-Stage Architecture

Capture (upstream contract)

Regression does not trust raw plans; it trusts canonical artifacts. The capture layer strips volatile fields (actual rows, execution time, buffer counts, memory grants, bind literals), resolves fully qualified object names, sorts order-irrelevant child nodes, and emits a deterministic SHA-256 fingerprint over the serialized plan DAG. This canonicalization — described in depth as the SHA-256 plan hashing approach and the cross-engine normalization engine — is what makes two runs of the same logical plan produce identical hashes regardless of hardware, concurrency, or transient hints. Failure isolation: malformed or partial plans are quarantined at capture and never enter regression; the stage fails closed by emitting no artifact rather than a corrupt one.

Regression (scoring and rule evaluation)

The regression stage is a stateless evaluator. For each incoming artifact it loads the active baseline version for that fingerprint, computes a delta vector, and runs the rule set. Structural diffing compares operator ordering, join topology, and access-path selection; statistical diffing compares cost estimates, cardinality accuracy, and observed latency against rolling baselines. The stage never mutates state — it reads baselines and emits verdicts, which keeps it horizontally scalable and safe to run many replicas behind the broker. Baseline anchoring itself (cost-model translation, versioning) lives in Core Architecture & Baselining Fundamentals; this stage consumes those anchors and quantifies drift, including per-version cost movement covered in Tracking Cost Deltas Across Baseline Versions and access-path changes covered in Detecting Join Type Shifts in Execution Plans. Failure isolation: a missing baseline yields a WARN with reason no_baseline (never a silent PASS), so unknown plans are surfaced, not admitted blindly.

CI Gate (enforcement boundary)

The CI Gate is the only synchronous stage. It receives the verdict payload from a webhook receiver wired into GitHub Actions, GitLab CI, Jenkins, or Argo, and translates the highest-severity verdict into a deploy decision: PASS admits, WARN admits with an annotation and required acknowledgement, BLOCK fails the check and posts the violated rules onto the merge request. Because the gate is synchronous, it must be fast and bounded — evaluation is precomputed in the regression stage, so the gate only reads an already-scored payload and applies policy. Failure isolation: if the gate cannot reach the verdict store within its timeout, it fails closed (BLOCK with reason gate_timeout) rather than admitting an unverified plan.

Index Sync (propagation)

Once a change is baseline-approved, Index Sync reconciles the approved structural set — new indexes, dropped redundant indexes, access-path pins — across replicas using declarative infrastructure-as-code. It is the only stage permitted to issue DDL, and it does so idempotently: the desired-state manifest is the source of truth, and reconciliation is safe to re-run. Regression signals that originate here (a sudden change in index seek frequency after a sync) feed back through Monitoring Index Usage Changes for Regression Signals. Failure isolation: partial DDL application is rolled back to the last reconciled manifest; a stuck sync pauses propagation rather than leaving clusters divergent.

Debugging (forensic feedback loop)

Every BLOCK and every acknowledged WARN routes to Debugging, which provisions an ephemeral, resource-isolated environment, restores the statistics snapshot that was active at capture time, and replays the exact canonical query signature. This yields a deterministic reproduction that links the verdict to a specific operator substitution, a stale histogram, or a schema revision — without touching live traffic. Failure isolation: replay runs against snapshots on isolated compute, so a pathological plan cannot consume production buffers or connections.

Threshold Matrix

Thresholds must be statistically grounded, not hand-picked constants. Hard latency ceilings collapse under variable concurrency; instead the engine anchors bands to rolling statistics — Exponentially Weighted Moving Averages (EWMA) for cost, percentile bands (p50/p95/p99) for latency, and confidence intervals for cardinality accuracy. The matrix below is the reference default for a Tier-1 (latency-critical) query class; lower tiers widen the bands. Detailed derivation of these numbers lives in Defining Regression Thresholds for Query Plans, and noise-suppression tuning lives in Tuning Thresholds for False Positive Reduction.

Metric	Pass	Warn	Block	Automation trigger on Block
Cost delta vs baseline EWMA	≤ 1.15×	> 1.15× and ≤ 3.0×	> 3.0×	Route to Debugging; require plan-pin ack
Row-estimate error (est vs actual)	≤ 20%	> 20% and ≤ 50%	> 50%	Trigger targeted `ANALYZE`; re-capture
p95 latency vs baseline (equal concurrency)	≤ 1.5×	> 1.5× and ≤ 2.0×	> 2.0×	Halt merge; open incident annotation
p99 latency vs baseline	≤ 1.8×	> 1.8× and ≤ 2.5×	> 2.5×	Halt merge; canary hold
Access-path substitution	none	approved-list change	index seek → seq scan on ≥ 10M-row table	Fail CI; block Index Sync
Join-algorithm shift	none	nested-loop ↔ hash within CI bounds	shift with cardinality outside CI	Route to join-shift diagnostics
New sequential scan introduced	none	on table < 1M rows	on table ≥ 10M rows	Fail CI; require index recommendation

Structural violations are evaluated before statistical ones, which are evaluated before latency ones, and evaluation short-circuits on the first BLOCK to keep the gate cheap. This precedence ensures a full-table-scan regression is never masked by an incidentally acceptable latency sample.

Production Readiness Requirements

Connection pool isolation. The regression and debugging stages must use a dedicated pool (for example pgbouncer in transaction mode, capped separately from application pools) so plan replay and baseline reads can never starve production request capacity. Size the governance pool independently and set statement_timeout low (2–5 s) on it.
Read-replica routing. All plan capture and replay reads route to read replicas or a statistics snapshot, never the primary. Regression is a read-mostly workload; the only writes are approved DDL, and those are confined to the Index Sync stage.
Circuit breaker configuration. Every stage sits behind a circuit breaker: trip after 5 consecutive broker or database failures, hold open for 30 s, then half-open with a single probe. Remediation actions (auto-ANALYZE, plan pinning, index rollback) are additionally bounded by a token-bucket limiter so a flapping baseline cannot launch a runaway remediation loop.
Privilege model. The capture/regression identity is granted read-only plus EXPLAIN rights and no DDL. Only the Index Sync service identity holds DDL privileges, and only against the reconciliation schema. Verdict artifacts and baselines are encrypted per Encrypting Baseline Query Plans at Rest and in Transit. This split guarantees that the component making the decision can never itself mutate schema.

Observability Hooks

Telemetry is the primary input to regression detection, not an afterthought. Every stage emits OpenTelemetry spans and Prometheus-style metrics with stable names and types so dashboards and alerts survive refactors. All plan-related spans follow the OpenTelemetry database semantic conventions.

Metric name	Type	Meaning
`regression_evaluations_total{verdict}`	counter	Verdicts emitted, labeled `pass`/`warn`/`block`
`regression_eval_duration_seconds`	histogram	Wall-clock per evaluation; alert if p95 > 250 ms
`regression_cost_delta_ratio`	gauge	Latest cost delta vs baseline EWMA per fingerprint
`regression_rowest_error_ratio`	histogram	Distribution of estimate-vs-actual error
`ci_gate_decisions_total{decision}`	counter	Gate admit/hold/fail outcomes
`ci_gate_open_seconds`	histogram	Synchronous gate latency; alert if p99 > 2 s
`index_sync_reconcile_lag_seconds`	gauge	Time since last successful reconcile per cluster
`baseline_missing_total`	counter	Evaluations that hit `no_baseline`

Span attributes carry sql.query.canonical_hash, db.plan.baseline_version, and db.optimizer.settings so a single trace correlates a capture, its verdict, and the resulting gate decision end to end. Routing of the raw EXPLAIN (ANALYZE, BUFFERS) payloads that back these metrics is handled by Routing EXPLAIN ANALYZE Output to Centralized Logs.

Python Orchestration Patterns

The pipeline is wired as a DAG in Airflow, Prefect, or Argo Workflows, with the broker (Kafka, Redis Streams, or SQS) carrying artifacts between stages. Serialization is JSON validated by pydantic models on both producer and consumer sides, so every boundary is a typed contract — the same discipline applied to schema validation for baseline metadata. Worker pool sizing follows the workload shape: capture is I/O-bound (size the pool to broker partition count, typically 8–16 async workers per node), regression is CPU-bound on DAG diffing (size to cpu_count, one process per core with an inner asyncio loop for DB reads), and the CI Gate runs a single small synchronous replica per environment because it must be strictly ordered.

The regression evaluator itself is an asyncio consumer instrumented with structlog and OpenTelemetry:

PYTHON

import asyncio
import asyncpg
import structlog
from opentelemetry import trace
from pydantic import BaseModel, Field

log = structlog.get_logger()
tracer = trace.get_tracer("regression.evaluator")


class PlanArtifact(BaseModel):
    canonical_hash: str
    baseline_version: str
    cost_est: float
    rows_est: int
    rows_actual: int
    p95_latency_ms: float
    access_paths: list[str] = Field(default_factory=list)


class Verdict(BaseModel):
    canonical_hash: str
    result: str  # "PASS" | "WARN" | "BLOCK"
    violated_rules: list[str] = Field(default_factory=list)


# Tier-1 thresholds; lower tiers widen these bands.
COST_WARN, COST_BLOCK = 1.15, 3.0
ROWERR_WARN, ROWERR_BLOCK = 0.20, 0.50
LAT_WARN, LAT_BLOCK = 1.5, 2.0


async def evaluate(artifact: PlanArtifact, pool: asyncpg.Pool) -> Verdict:
    with tracer.start_as_current_span("regression.evaluate") as span:
        span.set_attribute("sql.query.canonical_hash", artifact.canonical_hash)
        span.set_attribute("db.plan.baseline_version", artifact.baseline_version)

        async with pool.acquire() as conn:  # dedicated governance pool, statement_timeout=5s
            base = await conn.fetchrow(
                "SELECT cost_ewma, p95_latency_ms, access_paths "
                "FROM plan_baselines WHERE canonical_hash = $1 AND version = $2",
                artifact.canonical_hash, artifact.baseline_version,
            )

        if base is None:
            log.warning("no_baseline", hash=artifact.canonical_hash)
            return Verdict(canonical_hash=artifact.canonical_hash,
                           result="WARN", violated_rules=["no_baseline"])

        violations, worst = [], "PASS"

        # Structural precedence: access-path substitution is evaluated first.
        removed = set(base["access_paths"]) - set(artifact.access_paths)
        if any(p.startswith("index") for p in removed):
            violations.append("access_path_regression")
            worst = "BLOCK"
        else:
            cost_ratio = artifact.cost_est / max(base["cost_ewma"], 1e-9)
            rows_err = abs(artifact.rows_est - artifact.rows_actual) / max(artifact.rows_actual, 1)
            lat_ratio = artifact.p95_latency_ms / max(base["p95_latency_ms"], 1e-9)

            for name, value, warn, block in (
                ("cost_delta", cost_ratio, COST_WARN, COST_BLOCK),
                ("rowest_error", rows_err, ROWERR_WARN, ROWERR_BLOCK),
                ("p95_latency", lat_ratio, LAT_WARN, LAT_BLOCK),
            ):
                if value > block:
                    violations.append(f"{name}:block")
                    worst = "BLOCK"
                elif value > warn and worst != "BLOCK":
                    violations.append(f"{name}:warn")
                    worst = "WARN" if worst == "PASS" else worst

        span.set_attribute("regression.verdict", worst)
        log.info("verdict", hash=artifact.canonical_hash, result=worst, rules=violations)
        return Verdict(canonical_hash=artifact.canonical_hash,
                       result=worst, violated_rules=violations)


async def worker(pool: asyncpg.Pool, queue: asyncio.Queue[PlanArtifact]) -> None:
    while True:
        artifact = await queue.get()
        try:
            verdict = await evaluate(artifact, pool)
            await publish_verdict(verdict)  # to the CI Gate topic
        except Exception:  # circuit breaker + DLQ live in publish/consume wrappers
            log.exception("evaluate_failed", hash=artifact.canonical_hash)
        finally:
            queue.task_done()

Remediation actions triggered by a BLOCK — plan pinning via optimizer hints or stored plan baselines, targeted ANALYZE on drifted tables, index rollback through the IaC pipeline, and canary routing that shifts a bounded traffic percentage onto a baseline-validated variant — are each idempotent and rate-limited, and every one writes a structured audit event before it acts.

Common Failure Modes and Mitigations

Capture — non-deterministic hash churn. Symptom: the same logical plan produces new fingerprints every run. Cause: incomplete volatile-field stripping (residual bind literals, memory grants). Mitigation: extend the normalization allow-list and add a golden-plan regression test; verify against the normalization engine contract.
Regression — baseline gaps admit unknown plans. Symptom: new queries sail through with no evaluation. Mitigation: no_baseline returns WARN, never PASS; alert on baseline_missing_total rising, and auto-enroll new fingerprints into a probation baseline.
Regression — alert fatigue from transient spikes. Symptom: WARN storms during buffer warming or maintenance. Mitigation: EWMA/CUSUM smoothing and per-tier bands, tuned per Tuning Thresholds for False Positive Reduction.
CI Gate — verdict store unreachable. Symptom: deploys hang or slip through during an outage. Mitigation: fail closed with gate_timeout, bounded gate latency alerting on ci_gate_open_seconds, and a cached last-known verdict for read-only reruns.
Index Sync — divergent clusters. Symptom: an index exists on one replica but not others after a partial apply. Mitigation: idempotent declarative reconciliation, rollback to the last manifest, and monitoring via Monitoring Index Usage Changes for Regression Signals.
Debugging — irreproducible regressions. Symptom: a BLOCK cannot be replayed. Cause: statistics moved between capture and replay. Mitigation: pin the statistics snapshot at capture time and restore it in the isolated replay environment before running the canonical signature.

Tracking Cost Deltas Across Baseline Versions — quantifying optimizer cost movement between baseline versions.
Detecting Join Type Shifts in Execution Plans — flagging nested-loop ↔ hash/merge transitions.
Tuning Thresholds for False Positive Reduction — statistical process control for noise suppression.
Monitoring Index Usage Changes for Regression Signals — correlating access-path telemetry with schema changes.
Core Architecture & Baselining Fundamentals and Automated EXPLAIN Capture & Storage Workflows — the sibling topics that supply baselines and captured artifacts.

← Back to queryplan.org

Pipeline Overview: Capture → Regression → CI Gate → Index Sync → Debugging #

Stage-by-Stage Architecture #

Capture (upstream contract) #

Regression (scoring and rule evaluation) #

CI Gate (enforcement boundary) #

Index Sync (propagation) #

Debugging (forensic feedback loop) #

Threshold Matrix #

Production Readiness Requirements #

Observability Hooks #

Python Orchestration Patterns #

Common Failure Modes and Mitigations #

Related #