Defensive Input Handling¶
TopoExec validates graph inputs before runtime execution and keeps hostile or accidental large inputs bounded. This page records the current defensive limits and the behavior expected by tests.
Parser limits¶
The YAML loader rejects inputs before allocation-heavy runtime work when these limits are exceeded:
| Limit | Value | Applies to |
|---|---|---|
| Graph text size | 1 MiB | load_graph_text() and files read through load_graph_file() |
| Lanes | 256 | lanes mapping entries |
| Components | 4096 | components sequence entries |
| Edges | 8192 | edges sequence entries |
| CompositeLoops | 1024 | composite_loops sequence entries |
| Identifier length | 128 bytes | graph name, lane ids, component ids, edge ids, loop ids |
| Config nesting depth | 8 | graph/component config maps and sequences |
| Config scalar/serialized value | 4096 bytes | config scalar values and serialized nested values |
| Non-config string length | 4096 bytes | component types, endpoints, policy names, descriptors, and other YAML strings |
Limit errors are explicit std::invalid_argument messages that name the graph
path and limit, for example runtime graph.components count exceeds limit 4096.
Inputs must be valid UTF-8 text; invalid byte sequences fail before YAML parsing
with a bounded diagnostic instead of being treated as opaque binary.
The public GraphInputLimits struct exposes these defaults through
default_graph_input_limits(). Embedders can call load_graph_text(text,
limits) or load_graph_file(path, limits) to tighten limits for tests,
editors, or CI. File loading reads incrementally and fails once the configured
byte limit would be exceeded, so oversized files do not need to be materialized
fully before rejection.
CLI overrides¶
Graph-reading CLI commands accept the same local parser-limit overrides:
./build/topoexec graph validate examples/minimal.yaml \
--max-graph-input-bytes 65536 \
--max-components 512 \
--max-edges 1024 \
--format json
Overrides are intentionally per-invocation; TopoExec does not persist them in
project state. The defaults remain safe for normal examples, while CI/editor
jobs can lower them to protect interactive workflows. topoexec doctor --format
json reports the default graph_input_limits contract.
Numeric validation¶
The parser uses bounded integer conversions and semantic validation rejects
negative capacities, durations, trigger windows, and loop budgets where they are
not meaningful. Capacity and max_inflight errors are reported during graph
validation before runtime execution.
Fuzz and malformed input¶
fuzz_graph_input_smoke runs a deterministic corpus of malformed, cyclic,
partial, nested, invalid-UTF-8, oversized, and mutated YAML inputs. The optional
fuzz_graph_inputs target can run the checked-in corpus in standalone mode or
as a libFuzzer target. Both fail on timeouts, crash-like exit codes, or
sanitizer/crash markers. Run the local gate with ./scripts/goal_check.sh fuzz;
see Coverage-guided fuzzing for libFuzzer and regression-corpus
commands.
Trust boundary¶
TopoExec YAML declares graph structure. It is not an untrusted code execution
engine: the loader parses data, validates semantics, and returns a GraphSpec;
component instantiation still comes from an embedder-provided registry. Dynamic
plugin loading remains a separate default-off trusted-native preview; plugin
discovery, shared-library paths, signing/allowlists, sandboxing, and ABI
compatibility stay outside graph YAML and must remain explicit before any
broader plugin ecosystem can be enabled by default.
CLI output paths¶
Current CLI commands write to stdout/stderr only. Users can redirect output with their shell, but the CLI does not accept arbitrary output-file paths today. If a future command writes files, it must validate path traversal and overwrite policy before writing.
Blocking overflow policy¶
overflow: block is treated as a bounded policy, not as permission for hidden
unbounded recursion. Low-level channel calls return would block producer rather
than blocking forever in the single-thread path. Runtime async admission treats
block like a rejection when max_inflight is exhausted, so rejected completions
are not committed. Graph authors should prefer explicit drop_*, reject, or
fail_fast unless a future scheduler lane documents safe blocking behavior.
Error-message hygiene¶
Diagnostics should report graph paths, ids, policy names, and limits. They should
not dump environment variables, credentials, or unrelated filesystem state.
topoexec doctor --format json reports only tool/schema/example/benchmark
locations needed for local troubleshooting.