Guide
Cost Estimation Mapping Across PostgreSQL and MySQL
In a production-grade query plan baseline pipeline, raw optimizer outputs are fundamentally incompatible across database engines. PostgreSQL reports costs in arbitrary planner units derived from seq_page_cost, cpu_tuple_cost, and related GUCs, while MySQL’s optimizer expresses costs as a composite of row estimates, I/O operations, and CPU cycles. The Cost Normalization & Calibration Stage exists solely to bridge this semantic gap. This pipeline stage performs deterministic translation of engine-specific cost vectors into a unified, comparable metric space. It operates strictly downstream of plan capture and upstream of structural hashing. Within the broader Core Architecture & Baselining Fundamentals framework, this stage ensures that regression detection algorithms receive mathematically aligned inputs rather than raw, engine-biased planner outputs.
Stage Boundaries & Routing Logic
Strict isolation is mandatory for deterministic baselining. This stage accepts structured JSON payloads containing EXPLAIN or EXPLAIN ANALYZE trees, engine identifiers, and schema versions. It outputs normalized cost matrices with explicit confidence intervals. The routing logic enforces hard boundaries to prevent pipeline contamination:
- Ingress: Accepts only validated plan trees with complete cost annotations. Rejects partial outputs, plans missing
total_cost/rowsfields, or payloads lacking engine version metadata. Validation occurs synchronously before normalization begins. - Processing: Applies engine-specific normalization functions in a pure, stateless execution context. Does not compute plan hashes, does not evaluate thresholds, and does not trigger CI/CD gates. All side effects (metrics, traces) are decoupled via async event publishing.
- Egress: Emits standardized cost vectors to the hashing queue. If normalization fails, payloads route to a dead-letter queue (DLQ) with explicit error codes (
ERR_MISSING_STATS,ERR_COST_OVERFLOW,ERR_VERSION_DRIFT). Routing decisions are deterministic and logged atINFOorERRORlevels based on outcome.
Engine-Specific Cost Models & Translation Logic
Direct arithmetic comparison between PostgreSQL and MySQL costs is mathematically invalid. PostgreSQL’s planner uses a linear model where total_cost = startup_cost + run_cost, scaled by configurable constants that reflect hardware I/O characteristics. MySQL’s optimizer employs a non-linear, row-driven model heavily influenced by index selectivity, join buffering, and filter pushdown. To achieve Cost Estimation Mapping Across PostgreSQL and MySQL, the automation layer must first extract raw cost components, then apply a deterministic scaling function that maps both engines to a dimensionless baseline unit (DBU).
The normalization formula derives from empirical calibration against known workload profiles and hardware profiles:
- PostgreSQL:
DBU = (total_cost / baseline_seq_cost) * (1 + cpu_penalty_factor) - MySQL:
DBU = (optimizer_cost / baseline_io_cost) * row_selectivity_weight
These scaling factors are not static. They are versioned and stored alongside schema metadata to prevent silent drift during minor engine upgrades. Calibration coefficients are pulled from a signed configuration store at pipeline initialization. When mapping costs, the system references the exact engine minor version and applies the corresponding coefficient set. For deeper context on how normalized costs correlate with actual execution times, see Mapping EXPLAIN Costs to Real-World Latency Metrics.
Production Implementation & Validation Pipeline
The normalization service is implemented as a stateless Python worker. It relies on strict schema validation, typed configuration, and deterministic math operations to prevent floating-point instability.
import logging
from dataclasses import dataclass
from typing import Literal, Optional
from pydantic import BaseModel, Field, ValidationError
from opentelemetry import metrics, trace
logger = logging.getLogger(__name__)
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)
normalization_duration = meter.create_histogram(
"db.cost_normalization.duration_ms", description="Time to normalize engine costs"
)
normalization_failures = meter.create_counter(
"db.cost_normalization.failures_total", description="Total normalization routing failures"
)
@dataclass(frozen=True)
class CalibrationCoefficients:
baseline_seq_cost: float
cpu_penalty_factor: float
baseline_io_cost: float
row_selectivity_weight: float
class PlanPayload(BaseModel):
engine: Literal["postgresql", "mysql"]
engine_version: str
total_cost: float = Field(ge=0)
estimated_rows: int = Field(ge=0)
startup_cost: Optional[float] = None
schema_version: str
def load_calibration(version: str) -> CalibrationCoefficients:
"""Fetches versioned coefficients from secure config store. Fails fast on missing data."""
# Implementation omitted for brevity; assumes gRPC/HTTP fetch with TLS pinning
pass
def normalize_cost(payload: PlanPayload) -> float:
with tracer.start_as_current_span("normalize_cost") as span:
span.set_attribute("engine", payload.engine)
span.set_attribute("engine_version", payload.engine_version)
try:
coeffs = load_calibration(payload.engine_version)
except Exception as e:
normalization_failures.add(1, {"reason": "ERR_VERSION_DRIFT"})
raise ValueError("ERR_VERSION_DRIFT: Missing calibration data") from e
if payload.engine == "postgresql":
if payload.total_cost <= 0:
raise ValueError("ERR_MISSING_STATS: Invalid total_cost")
dbu = (payload.total_cost / coeffs.baseline_seq_cost) * (1 + coeffs.cpu_penalty_factor)
elif payload.engine == "mysql":
if payload.total_cost <= 0 or payload.estimated_rows == 0:
raise ValueError("ERR_MISSING_STATS: Invalid cost or row estimate")
selectivity = min(payload.estimated_rows / 1_000_000, 1.0)
dbu = (payload.total_cost / coeffs.baseline_io_cost) * selectivity
else:
raise ValueError("ERR_UNSUPPORTED_ENGINE")
if dbu > 1e9:
normalization_failures.add(1, {"reason": "ERR_COST_OVERFLOW"})
raise ValueError("ERR_COST_OVERFLOW: Normalized cost exceeds safe bounds")
return round(dbu, 4)The implementation enforces strict type boundaries and fails explicitly on invalid inputs. All normalization math is bounded to prevent overflow, and version mismatches immediately trigger DLQ routing rather than silent fallback to stale coefficients.
Observability Hooks & Safe Fallback Protocols
Production baselining requires deterministic observability. The normalization stage emits structured telemetry at three critical boundaries:
- Ingress Validation Metrics:
plan_ingress_validated_totalandplan_ingress_rejected_totaltrack schema compliance. Rejections include the exact missing field in thereasonlabel. - Processing Latency & Distribution:
db.cost_normalization.duration_msis recorded per payload. High p99 latency indicates config store contention or unbounded coefficient lookups. - Egress Routing Signals: Successful normalization publishes to the
cost_normalizedKafka topic. Failures publish tocost_dlqwith a structured envelope containingerror_code,engine,version, andtrace_id.
Safe Fallback Protocols:
- Config Degradation: If the calibration service returns
HTTP 5xxor times out, the worker does not default to hardcoded values. Instead, it routes to DLQ withERR_VERSION_DRIFTand triggers an alert. Baseline integrity is prioritized over pipeline throughput. - Cost Overflow Protection: Any normalized DBU exceeding
1e9is treated as a planner anomaly. The payload is quarantined, and aWARNevent is emitted to the SRE dashboard. This prevents outlier costs from skewing downstream Defining Regression Thresholds for Query Plans calculations. - Circuit Breaker: A token-bucket rate limiter caps normalization attempts per engine version. During mass schema migrations, the breaker opens to prevent config store saturation, routing excess payloads to a retry queue with exponential backoff.
Integration with Downstream Baseline Stages
Normalized DBU vectors are strictly consumed by the structural hashing layer. The Plan Hashing Algorithms for SQL Engines module expects dimensionless, engine-agnostic inputs to generate stable plan fingerprints. By decoupling cost mapping from hash generation, the pipeline guarantees that identical logical plans across PostgreSQL and MySQL produce comparable baseline signatures.
Threshold evaluation occurs exclusively after hashing and normalization are complete. Regression gates compare the current DBU against the historical baseline using statistical bounds (e.g., 95th percentile confidence intervals). Because this stage guarantees mathematical alignment, threshold logic can safely apply uniform multipliers without engine-specific branching.
For authoritative reference on raw EXPLAIN output structures, consult the official documentation: PostgreSQL EXPLAIN Documentation and MySQL EXPLAIN Output Format. These sources define the baseline fields that the normalization stage validates before translation.