Testing Strategy¶
TopoExec uses a small test pyramid that favors semantic runtime coverage over large fixtures. The bounded reliability program is summarized in Reliability, soak, and performance regression. The default local gate remains:
./scripts/agent_check.sh
This configures, builds, and runs all default CTest tests.
Test layers¶
| Layer | Evidence | Command |
|---|---|---|
| Quality gates | whitespace, commit-message policy, clang-format, production clang-tidy, build, and default CTest | ./scripts/agent_check.sh |
| Unit | test_common, test_graph, test_channel, test_state, test_runtime |
ctest --test-dir build --output-on-failure -R 'test_' |
| Semantic runtime | graph compiler, edge visibility, trigger, scheduler, async, CompositeLoop, state/config tests | ctest --test-dir build --output-on-failure -R 'test_graph|test_runtime|test_state' |
| Golden CLI | normalized plan, metrics, trace, and render outputs | ./scripts/goal_check.sh golden |
| Schema | strict schema contract plus schema/semantic CLI split | ./scripts/goal_check.sh schema |
| Editor schema UX | installed schema discovery, stable schema dump, and diagnostic JSON fields for editor problem lists | ctest --test-dir build --output-on-failure -R editor_schema_ux_smoke |
| Docs | recursive topoexec-doc-test markers plus docs learning-map/section contract |
./scripts/goal_check.sh docs |
| Community readiness | contributor guide, code of conduct, issue templates, PR template, and unsafe-change guidance | ctest --test-dir build --output-on-failure -R community_readiness_smoke |
| Example apps | dependency-free reference apps plus the robot-cell pilot covering latest/drop, fixed-rate state feedback, task completion, CompositeLoop, payload ownership, multi-lane feedback, config snapshots, metrics/trace, and invalid-config rejection | ctest --test-dir build --output-on-failure -R 'app_' |
| Fuzz smoke | deterministic malformed, invalid-UTF-8, oversized, parser-limit corpus plus optional fuzzer target corpus replay | ./scripts/goal_check.sh fuzz |
| Stress smoke | generated scheduler/channel graph workloads plus task-executor/thread-pool overload stress | ./scripts/goal_check.sh stress |
| Benchmark smoke | RuntimeRunner benchmark cases, task-executor benchmark output, schema v2 metadata, and optional local baseline generation | ./scripts/goal_check.sh bench |
| Live observe smoke | graph observe schema, CLI output, live assertions, record/replay artifacts, local dashboard assets, and observer-drop summaries |
./scripts/goal_check.sh live |
| Live observe performance | disabled/summary/detailed/debug mode overhead smoke with opt-in per-machine thresholds and overflow/drop visibility | ./scripts/goal_check.sh live-perf |
| Package | install/export/downstream find_package(topoexec) runtime-only, adapter SDK, YAML, CLI, and installed-schema discovery smokes |
./scripts/goal_check.sh package |
| Adapter previews | optional adapter targets, package exports, and dependency-boundary policy | ./scripts/goal_check.sh adapters |
| Release prep | non-publishing release_prepare dry run, release notes draft, and human-only tag-command contract | ./scripts/goal_check.sh release |
| Sanitizers | ASAN+UBSAN full CTest; TSAN non-blocking CI | ./scripts/goal_check.sh sanitizer |
| Formatting/static analysis | repository clang-format and clang-tidy baselines | ./scripts/goal_check.sh format / ./scripts/goal_check.sh tidy |
Fuzz smoke¶
tests/fuzz/fuzz_graph_inputs.py generates a deterministic corpus of malformed,
partial, cyclic, nested, invalid-UTF-8, oversized, and mutated YAML graph inputs.
The optional fuzz_graph_inputs target can also replay
tests/fuzz/corpus/graph_inputs in standalone mode or run as a libFuzzer target
when built with Clang. Together they prove the CLI/parser/compiler path rejects
hostile inputs without timeouts, crash-like exits, or sanitizer/crash markers,
and they provide a place to store minimized crash regressions.
Run it directly with:
ctest --test-dir build --output-on-failure -R fuzz_graph_input_smoke
TOPOEXEC_FUZZER_ENGINE=STANDALONE ./scripts/fuzz_smoke.sh
See Coverage-guided fuzzing for libFuzzer commands and corpus rules.
Stress and soak smoke¶
test_stress exercises opt-in ThreadedTaskExecutor overload and a bursty
thread_pool graph with expected rejections and bounded queue-depth assertions.
stress_graph_smoke generates high fan-out, high fan-in, long-chain, mixed
immediate/delay/state/async, and bounded thread-pool graph workloads and runs
them through topoexec graph metrics.
Run the focused gate with:
./scripts/goal_check.sh stress
Longer soak runs are opt-in and bounded by caller-selected steps, duration, and iteration limits:
TOPOEXEC_STRESS_PROFILE=soak TOPOEXEC_STRESS_DURATION_SECONDS=60 ./scripts/stress_smoke.sh
See Stress and soak testing for workload details and configuration. Stress success is confidence evidence, not a performance or real-time guarantee.
Benchmark smoke¶
bench_contract_smoke runs all checked-in benchmark YAML cases through
topoexec graph bench --format json and asserts schema v2 metadata, graph hash,
per-run samples, p50/p95/p99 summaries, throughput fields, and empty errors.
bench_task_executor_smoke covers the non-installed task-executor benchmark
binary. ./scripts/goal_check.sh bench also writes a short local baseline to
/tmp/topoexec-bench-baseline.json without applying a timing threshold.
Longer comparisons are opt-in and per-machine:
TOPOEXEC_BENCH_BASELINE_IN=benchmarks/local-baseline.json \
TOPOEXEC_BENCH_THRESHOLD_PERCENT=15 \
./scripts/bench_baseline.sh
See Performance baselines for interpretation and policy. Benchmark success proves output-contract and workload health, not a portable performance guarantee.
Live observe smoke and performance¶
./scripts/goal_check.sh live validates the local live runtime validation
surface without treating it as a runtime control plane. It runs the CTest live
smokes, scripts/live_smoke.sh, the observe NDJSON schema checker, assertion
pass/fail/pending checks, record/replay artifact checks, static dashboard
checks, and an observer-overflow smoke proving drops are reported without
changing runtime success.
./scripts/goal_check.sh live-perf runs scripts/live_perf_check.py against
the live-observe benchmark cases. The default gate is CI-safe: it verifies
command success, runtime_ok, observer overflow visibility, and reports
disabled/summary/detailed/debug median and p95 timings as JSON. Timing
thresholds are intentionally opt-in and per-machine:
TOPOEXEC_LIVE_PERF_ENFORCE=1 \
TOPOEXEC_LIVE_SUMMARY_OVERHEAD_PERCENT=2 \
TOPOEXEC_LIVE_DETAILED_OVERHEAD_PERCENT=5 \
./scripts/goal_check.sh live-perf
See Live observe performance for the low overhead policy. Debug mode is explicitly intrusive and has no hard default threshold. Successful live gates prove observability/test-validation behavior; they do not establish hard real-time guarantees or production telemetry support.
Docs smoke¶
docs_command_smoke runs every topoexec-doc-test marker under docs/ and
verifies the docs map: getting-started, concepts, the robot-cell case study,
runtime semantics, API reference, schema, cookbook, adapters, testing/release
pages, beta-readiness review, architecture diagrams, why-not comparisons,
design principles, and community contribution entry points. It also executes the
reference-app binaries and the pilot app listed from
docs/11-user-guide/examples.md. Add a marker for commands that should remain
executable, and update tests/docs/check_docs.py only when the documentation
contract intentionally changes.
Release prep smoke¶
release_prepare_smoke runs scripts/release_prepare.sh --dry-run --allow-dirty
--allow-existing-tag with the recommended v0.2.0-alpha.0 prerelease target.
It proves the script can validate release docs/changelog policy, draft notes,
and write the human-only annotated tag command without creating artifacts, tags,
or uploads. The smoke is idempotent when the local candidate tag already exists;
real non-dry-run release prep still fails on existing tags.
Use the focused gate for script changes:
./scripts/goal_check.sh release
Artifact creation is covered by candidate rehearsals through
scripts/release_prepare.sh --skip-gates or by a full clean-tree release prep
when a human-approved candidate commit is ready.
Sanitizer gates¶
CMake exposes explicit sanitizer options for GCC/Clang builds:
cmake -S . -B build-asan-ubsan -DCMAKE_BUILD_TYPE=Debug \
-DTOPOEXEC_ENABLE_ASAN=ON \
-DTOPOEXEC_ENABLE_UBSAN=ON
cmake --build build-asan-ubsan -j
ctest --test-dir build-asan-ubsan --output-on-failure
The wrapper is shorter:
TOPOEXEC_SANITIZER_MODE=address-undefined ./scripts/sanitizer_check.sh
TOPOEXEC_SANITIZER_MODE=thread ./scripts/sanitizer_check.sh
address-undefined runs ASAN+UBSAN together. thread runs TSAN alone because
TSAN cannot be combined with ASAN/UBSAN in this build. The package smoke passes
sanitizer link flags to its downstream runtime-only app so sanitizer builds still
verify install/export behavior.
CI policy¶
- GCC and Clang Debug/RelWithDebInfo matrix jobs run the default gate.
- ASAN+UBSAN is a blocking CI job.
- TSAN remains non-blocking until the runtime concurrency surface is mature enough to make it a release blocker. The selected stress smoke tests run in TSAN because they are normal CTest entries.
Adding tests¶
When changing runtime semantics, add or update the smallest semantic test that
proves the claim, then update docs/goldens only if public behavior intentionally
changed. New docs commands should include a topoexec-doc-test marker so
docs_command_smoke keeps the prose executable.