Reliability, Soak, and Performance Regression Program¶
Status: bounded local reliability program for the v0.2.0-alpha.0 candidate
line. It layers existing tests into tiers and keeps longer evidence opt-in so
CI and agent runs do not become flaky or unbounded.
Test tiers¶
| Tier | Purpose | Required command |
|---|---|---|
| T0 required gate | Formatting, commit-message policy, build, clang-tidy, and default CTest. | ./scripts/agent_check.sh |
| T1 compatibility/release | Stable-v0.2 contract, package install/downstream, docs, release-prep dry run. | ./scripts/goal_check.sh compat, package, docs, release |
| T2 runtime robustness | Bounded stress, deterministic fuzz smoke, benchmark contract, live observe smoke. | ./scripts/goal_check.sh stress, fuzz, bench, live |
| T3 sanitizer | ASAN+UBSAN full CTest. TSAN remains non-blocking policy evidence. | ./scripts/goal_check.sh sanitizer |
| T4 opt-in soak/perf | Bounded soak-lite or longer human-requested runs; timing thresholds are per-machine. | ./scripts/soak_lite_smoke.sh, opt-in benchmark/live-perf thresholds |
Bounded soak-lite¶
The default soak-lite command repeats the generated stress workload suite with small explicit limits:
./scripts/soak_lite_smoke.sh
It sets TOPOEXEC_STRESS_PROFILE=soak, a short duration, a maximum iteration
count, and small scale/step values. Longer soaks must set explicit duration,
iteration, scale, and timeout values and should attach output as local release
or CI evidence instead of editing docs with machine-specific timings.
Performance regression policy¶
./scripts/goal_check.sh bench proves benchmark JSON shape, graph hashes,
samples, p50/p95/p99 summaries, and local baseline generation. It does not apply
a global timing threshold. Timing comparisons are opt-in and per-machine:
TOPOEXEC_BENCH_BASELINE_IN=benchmarks/local-baseline.json \
TOPOEXEC_BENCH_THRESHOLD_PERCENT=15 \
./scripts/bench_baseline.sh
If a future goal adds thresholds, it must document hardware, compiler, build type, run count, accepted variance, and failure artifact location.
Fuzz corpus and sanitizer policy¶
Deterministic fuzz smoke owns the checked-in corpus under
tests/fuzz/corpus/graph_inputs/. New minimized parser/compiler crashes should
be added there with an issue or changelog note. Coverage-guided fuzz campaigns
remain optional release evidence, not a default local gate.
ASAN+UBSAN is the blocking sanitizer gate. ThreadSanitizer remains non-blocking until release governance changes it because current concurrency previews and platform scheduler behavior can make TSAN policy noisy. Do not hide TSAN gaps; record whether it was run and whether failures are blocking in release notes.
Failure artifact convention¶
When a robustness gate fails, capture the smallest durable artifact set:
/tmp/topoexec-<gate>-*.json
/tmp/topoexec-<gate>-*.txt
/tmp/topoexec-<gate>-*.ndjson
/tmp/topoexec-<gate>-*.log
For release candidates, copy relevant artifacts into the release-prep bundle or CI job artifacts. Do not commit machine-specific benchmark baselines unless a future release owner explicitly asks for a portable baseline policy.
Non-goals¶
- no hard real-time guarantee;
- no unbounded soak or long fuzz campaign in the required local gate;
- no global performance threshold across machines;
- no production telemetry exporter or dashboard control plane;
- no hidden sanitizer exceptions.