Skip to content

Reliability, Soak, and Performance Regression Program

Status: bounded local reliability program for the v0.2.0-alpha.0 candidate line. It layers existing tests into tiers and keeps longer evidence opt-in so CI and agent runs do not become flaky or unbounded.

Test tiers

Tier Purpose Required command
T0 required gate Formatting, commit-message policy, build, clang-tidy, and default CTest. ./scripts/agent_check.sh
T1 compatibility/release Stable-v0.2 contract, package install/downstream, docs, release-prep dry run. ./scripts/goal_check.sh compat, package, docs, release
T2 runtime robustness Bounded stress, deterministic fuzz smoke, benchmark contract, live observe smoke. ./scripts/goal_check.sh stress, fuzz, bench, live
T3 sanitizer ASAN+UBSAN full CTest. TSAN remains non-blocking policy evidence. ./scripts/goal_check.sh sanitizer
T4 opt-in soak/perf Bounded soak-lite or longer human-requested runs; timing thresholds are per-machine. ./scripts/soak_lite_smoke.sh, opt-in benchmark/live-perf thresholds

Bounded soak-lite

The default soak-lite command repeats the generated stress workload suite with small explicit limits:

./scripts/soak_lite_smoke.sh

It sets TOPOEXEC_STRESS_PROFILE=soak, a short duration, a maximum iteration count, and small scale/step values. Longer soaks must set explicit duration, iteration, scale, and timeout values and should attach output as local release or CI evidence instead of editing docs with machine-specific timings.

Performance regression policy

./scripts/goal_check.sh bench proves benchmark JSON shape, graph hashes, samples, p50/p95/p99 summaries, and local baseline generation. It does not apply a global timing threshold. Timing comparisons are opt-in and per-machine:

TOPOEXEC_BENCH_BASELINE_IN=benchmarks/local-baseline.json \
TOPOEXEC_BENCH_THRESHOLD_PERCENT=15 \
./scripts/bench_baseline.sh

If a future goal adds thresholds, it must document hardware, compiler, build type, run count, accepted variance, and failure artifact location.

Fuzz corpus and sanitizer policy

Deterministic fuzz smoke owns the checked-in corpus under tests/fuzz/corpus/graph_inputs/. New minimized parser/compiler crashes should be added there with an issue or changelog note. Coverage-guided fuzz campaigns remain optional release evidence, not a default local gate.

ASAN+UBSAN is the blocking sanitizer gate. ThreadSanitizer remains non-blocking until release governance changes it because current concurrency previews and platform scheduler behavior can make TSAN policy noisy. Do not hide TSAN gaps; record whether it was run and whether failures are blocking in release notes.

Failure artifact convention

When a robustness gate fails, capture the smallest durable artifact set:

/tmp/topoexec-<gate>-*.json
/tmp/topoexec-<gate>-*.txt
/tmp/topoexec-<gate>-*.ndjson
/tmp/topoexec-<gate>-*.log

For release candidates, copy relevant artifacts into the release-prep bundle or CI job artifacts. Do not commit machine-specific benchmark baselines unless a future release owner explicitly asks for a portable baseline policy.

Non-goals

  • no hard real-time guarantee;
  • no unbounded soak or long fuzz campaign in the required local gate;
  • no global performance threshold across machines;
  • no production telemetry exporter or dashboard control plane;
  • no hidden sanitizer exceptions.