Testing Strategy¶

TopoExec uses a small test pyramid that favors semantic runtime coverage over large fixtures. The bounded reliability program is summarized in Reliability, soak, and performance regression. The default local gate remains:

./scripts/agent_check.sh

This configures, builds, and runs all default CTest tests.

Test layers¶

Layer	Evidence	Command
Quality gates	whitespace, commit-message policy, clang-format, production clang-tidy, build, and default CTest	`./scripts/agent_check.sh`
Unit	`test_common`, `test_graph`, `test_channel`, `test_state`, `test_runtime`	`ctest --test-dir build --output-on-failure -R 'test_'`
Semantic runtime	graph compiler, edge visibility, trigger, scheduler, async, CompositeLoop, state/config tests	`ctest --test-dir build --output-on-failure -R 'test_graph\|test_runtime\|test_state'`
Golden CLI	normalized plan, metrics, trace, and render outputs	`./scripts/goal_check.sh golden`
Schema	strict schema contract plus schema/semantic CLI split	`./scripts/goal_check.sh schema`
Editor schema UX	installed schema discovery, stable schema dump, and diagnostic JSON fields for editor problem lists	`ctest --test-dir build --output-on-failure -R editor_schema_ux_smoke`
Docs	recursive `topoexec-doc-test` markers plus docs learning-map/section contract	`./scripts/goal_check.sh docs`
Community readiness	contributor guide, code of conduct, issue templates, PR template, and unsafe-change guidance	`ctest --test-dir build --output-on-failure -R community_readiness_smoke`
Example apps	dependency-free reference apps plus the robot-cell pilot covering latest/drop, fixed-rate state feedback, task completion, CompositeLoop, payload ownership, multi-lane feedback, config snapshots, metrics/trace, and invalid-config rejection	`ctest --test-dir build --output-on-failure -R 'app_'`
Fuzz smoke	deterministic malformed, invalid-UTF-8, oversized, parser-limit corpus plus optional fuzzer target corpus replay	`./scripts/goal_check.sh fuzz`
Stress smoke	generated scheduler/channel graph workloads plus task-executor/thread-pool overload stress	`./scripts/goal_check.sh stress`
Benchmark smoke	RuntimeRunner benchmark cases, task-executor benchmark output, schema v2 metadata, and optional local baseline generation	`./scripts/goal_check.sh bench`
Live observe smoke	`graph observe` schema, CLI output, live assertions, record/replay artifacts, local dashboard assets, and observer-drop summaries	`./scripts/goal_check.sh live`
Live observe performance	disabled/summary/detailed/debug mode overhead smoke with opt-in per-machine thresholds and overflow/drop visibility	`./scripts/goal_check.sh live-perf`
Package	install/export/downstream `find_package(topoexec)` runtime-only, adapter SDK, YAML, CLI, and installed-schema discovery smokes	`./scripts/goal_check.sh package`
Adapter previews	optional adapter targets, package exports, and dependency-boundary policy	`./scripts/goal_check.sh adapters`
Release prep	non-publishing release_prepare dry run, release notes draft, and human-only tag-command contract	`./scripts/goal_check.sh release`
Sanitizers	ASAN+UBSAN full CTest; TSAN non-blocking CI	`./scripts/goal_check.sh sanitizer`
Formatting/static analysis	repository clang-format and clang-tidy baselines	`./scripts/goal_check.sh format` / `./scripts/goal_check.sh tidy`

Fuzz smoke¶

tests/fuzz/fuzz_graph_inputs.py generates a deterministic corpus of malformed, partial, cyclic, nested, invalid-UTF-8, oversized, and mutated YAML graph inputs. The optional fuzz_graph_inputs target can also replay tests/fuzz/corpus/graph_inputs in standalone mode or run as a libFuzzer target when built with Clang. Together they prove the CLI/parser/compiler path rejects hostile inputs without timeouts, crash-like exits, or sanitizer/crash markers, and they provide a place to store minimized crash regressions.

Run it directly with:

ctest --test-dir build --output-on-failure -R fuzz_graph_input_smoke
TOPOEXEC_FUZZER_ENGINE=STANDALONE ./scripts/fuzz_smoke.sh

See Coverage-guided fuzzing for libFuzzer commands and corpus rules.

Stress and soak smoke¶

test_stress exercises opt-in ThreadedTaskExecutor overload and a bursty thread_pool graph with expected rejections and bounded queue-depth assertions. stress_graph_smoke generates high fan-out, high fan-in, long-chain, mixed immediate/delay/state/async, and bounded thread-pool graph workloads and runs them through topoexec graph metrics.

Run the focused gate with:

./scripts/goal_check.sh stress

Longer soak runs are opt-in and bounded by caller-selected steps, duration, and iteration limits:

TOPOEXEC_STRESS_PROFILE=soak TOPOEXEC_STRESS_DURATION_SECONDS=60 ./scripts/stress_smoke.sh

See Stress and soak testing for workload details and configuration. Stress success is confidence evidence, not a performance or real-time guarantee.

Benchmark smoke¶

bench_contract_smoke runs all checked-in benchmark YAML cases through topoexec graph bench --format json and asserts schema v2 metadata, graph hash, per-run samples, p50/p95/p99 summaries, throughput fields, and empty errors. bench_task_executor_smoke covers the non-installed task-executor benchmark binary. ./scripts/goal_check.sh bench also writes a short local baseline to /tmp/topoexec-bench-baseline.json without applying a timing threshold.

Longer comparisons are opt-in and per-machine:

TOPOEXEC_BENCH_BASELINE_IN=benchmarks/local-baseline.json \
TOPOEXEC_BENCH_THRESHOLD_PERCENT=15 \
./scripts/bench_baseline.sh

See Performance baselines for interpretation and policy. Benchmark success proves output-contract and workload health, not a portable performance guarantee.

Live observe smoke and performance¶

./scripts/goal_check.sh live validates the local live runtime validation surface without treating it as a runtime control plane. It runs the CTest live smokes, scripts/live_smoke.sh, the observe NDJSON schema checker, assertion pass/fail/pending checks, record/replay artifact checks, static dashboard checks, and an observer-overflow smoke proving drops are reported without changing runtime success.

./scripts/goal_check.sh live-perf runs scripts/live_perf_check.py against the live-observe benchmark cases. The default gate is CI-safe: it verifies command success, runtime_ok, observer overflow visibility, and reports disabled/summary/detailed/debug median and p95 timings as JSON. Timing thresholds are intentionally opt-in and per-machine:

TOPOEXEC_LIVE_PERF_ENFORCE=1 \
TOPOEXEC_LIVE_SUMMARY_OVERHEAD_PERCENT=2 \
TOPOEXEC_LIVE_DETAILED_OVERHEAD_PERCENT=5 \
./scripts/goal_check.sh live-perf

See Live observe performance for the low overhead policy. Debug mode is explicitly intrusive and has no hard default threshold. Successful live gates prove observability/test-validation behavior; they do not establish hard real-time guarantees or production telemetry support.

Docs smoke¶

docs_command_smoke runs every topoexec-doc-test marker under docs/ and verifies the docs map: getting-started, concepts, the robot-cell case study, runtime semantics, API reference, schema, cookbook, adapters, testing/release pages, beta-readiness review, architecture diagrams, why-not comparisons, design principles, and community contribution entry points. It also executes the reference-app binaries and the pilot app listed from docs/11-user-guide/examples.md. Add a marker for commands that should remain executable, and update tests/docs/check_docs.py only when the documentation contract intentionally changes.

Release prep smoke¶

release_prepare_smoke runs scripts/release_prepare.sh --dry-run --allow-dirty --allow-existing-tag with the recommended v0.2.0-alpha.0 prerelease target. It proves the script can validate release docs/changelog policy, draft notes, and write the human-only annotated tag command without creating artifacts, tags, or uploads. The smoke is idempotent when the local candidate tag already exists; real non-dry-run release prep still fails on existing tags. Use the focused gate for script changes:

./scripts/goal_check.sh release

Artifact creation is covered by candidate rehearsals through scripts/release_prepare.sh --skip-gates or by a full clean-tree release prep when a human-approved candidate commit is ready.

Sanitizer gates¶

CMake exposes explicit sanitizer options for GCC/Clang builds:

cmake -S . -B build-asan-ubsan -DCMAKE_BUILD_TYPE=Debug \
  -DTOPOEXEC_ENABLE_ASAN=ON \
  -DTOPOEXEC_ENABLE_UBSAN=ON
cmake --build build-asan-ubsan -j
ctest --test-dir build-asan-ubsan --output-on-failure

The wrapper is shorter:

TOPOEXEC_SANITIZER_MODE=address-undefined ./scripts/sanitizer_check.sh
TOPOEXEC_SANITIZER_MODE=thread ./scripts/sanitizer_check.sh

address-undefined runs ASAN+UBSAN together. thread runs TSAN alone because TSAN cannot be combined with ASAN/UBSAN in this build. The package smoke passes sanitizer link flags to its downstream runtime-only app so sanitizer builds still verify install/export behavior.

CI policy¶

GCC and Clang Debug/RelWithDebInfo matrix jobs run the default gate.
ASAN+UBSAN is a blocking CI job.
TSAN remains non-blocking until the runtime concurrency surface is mature enough to make it a release blocker. The selected stress smoke tests run in TSAN because they are normal CTest entries.

Adding tests¶

When changing runtime semantics, add or update the smallest semantic test that proves the claim, then update docs/goldens only if public behavior intentionally changed. New docs commands should include a topoexec-doc-test marker so docs_command_smoke keeps the prose executable.